How to load test PostgreSQL database and not miss anything / Habr

During load testing of Tantor Postgres databases or other PostgreSQL-based databases using the standard tool pgbench, specialists often encounter non-representative results and the need for repeated tests due to the fact that details of the environment (such as DBMS configuration, server characteristics, PostgreSQL versions) are not recorded. In this article we are going to review author's pg_perfbench, which is designed to address this issue. It ensures that scenarios are repeatable, prevents the loss of important data, and streamlines result comparison by registering all parameters in a single template. It also automatically launches pgbench with TPC-B load generation, collects all metadata on the testing environment, and generates a structured report.

Automation of PostgreSQL-based DB load testing and system information collection

During development and support, it is frequently necessary to quickly conduct a load test in order to evaluate hypotheses or the overall efficiency of Tantor Postgres DB or other PostgreSQL-based databases. Most often, specialists use the standard tool pgbench for TPC-B load generation and evaluate the results. However, when describing the results of such tests, important details about the environment are often overlooked. There is no clear description of the used configuration: PostgreSQL parameters, server hardware characteristics, client version, and network settings. And that can lead to ambiguous or incomplete conclusions re performance. And you may need to spend time on re-running tests.

To eliminate such routine tasks and minimize the risk of 'forgotten' details, I developed the pg_perfbench tool, which starts pgbench and collects data on the database environment, then generates a full report based on a predefined template. This helps to replay testing scenarios, document all necessary database environment details, and quickly compare test results if needed.

Motivation for development

Let's review a typical performance measurement scenario for Tantor Postgres or other PostgreSQL-based databases, during which we need not only to obtain transactions per second but also to create a detailed description of the performed test. In real practice, database administrators and developers face the following tasks:

Checking the behavior of the database under increased load (number of connections, data size, etc.).
Comparing the performance of different versions of PostgreSQL or different configurations (for example, different parameters like shared_buffers, work_mem, enabled extensions, etc.).
Recording test results for follow-up analysis and replay.

However, without a full description of the environment, including OS versions, parameters of the network infrastructure, server characteristics (CPU, RAM, disk subsystem), and the PostgreSQL configuration itself, follow-up analysis and replay of experiments become quite challenging. Any insignificant change, whether the use of different data directories (for example, one with included checksums, and another without them) or other differences in hardware and network configuration, can significantly affect the results. Then, when trying to compare the new indicators with the previous ones, it becomes clear that the tests are actually incomparable, since they were run under different conditions.

The idea of supplementing standard load testing tools with automatic report generation originates from the need to record both productivity indicators and details on the environment. For similar pgbench runs, solutions with custom scripts are well suited, but they often require modification and manual collection of data to get a complete picture. In practice, this leads to an increase in temporary costs and the risk of missing some important detail in the final description.

Standard load testing cycle with pgbench

pgbench operates with a load built per TPC-B model, and allows you to quickly obtain the following metrics:

tps (transactions per second) - the number of transactions executed per second.
Latency average - average latency between transactions' execution, measured in milliseconds (ms).
Number of transactions actually processed - the total number of successfully processed transactions.
Number of failed transactions - the number of transactions that ended with an error.
Initial connection time - the time spent on establishing the initial connection with the database, measured in milliseconds (ms) or seconds (s) depending on the measurement context.

To demonstrate, let's take a classic scenario: cyclic launch of pgbench in a simple Bash script with iterations, where the number of clients (connections to the database) varies. Script example:

/usr/lib/postgresql/15/bin/pgbench -i --scale=4000 --foreign-keys -h `hostname` -p 5432 -U postgres test
 
#!/bin/bash
clients="1 10 20 50 100"
t=600
dir=/var/lib/postgresql/test_result
mkdir -p $dir
for c in $clients; do
echo "pgbench_${c}_${t}.txt"
echo "start test: "`date +"%Y.%m.%d_%H:%M:%S"` >> "${dir}/pgbench_${c}.txt"
/usr/lib/postgresql/15/bin/pgbench -h `hostname` -p 5432 test -c $c -j $c -T $t >> "${dir}/pgbench_${c}.txt"
echo "stop test: "`date +"%Y.%m.%d_%H:%M:%S"` >> "${dir}/pgbench_${c}.txt"
done

This code:

Initializes pgbench test database with a scale of 4000 (parameter --scale=4000) and foreign keys (--foreign-keys).
Launches load cycles based on the variable number of clients (1, 10, 20, 50, 100) over 600 seconds (-T 600).
Logs the results into separate files, starting and ending each iteration with timestamps.

Typically, a chart is generated at the end, showing the dependency of indicators (e.g., tps or average latency) on the number of clients:

We can also create an additional chart to show the Latency average.

Despite the fact that pgbench data already provide a basic performance estimate, we also need to consider the following to get the full analysis:

Server parameters:

CPU (model, number of cores, Hyper-Threading support);
RAM (capacity, frequency, type);
disk space (storage type, capacity, RAID configuration, etc.);
network parameter settings (TCP/IP stack, MTU, etc.);
system drivers (e.g. for SSD or network cards).

Database parameters:

exact version of PostgreSQL (server and client, if necessary — patches);
enabled extensions (e.g. pg_repack, pg_stat_statements);
database configuration (postgresql.conf) and key parameters (shared_buffers, work_mem, wal_buffers, etc.);
pg_config results (build options, paths to directories, etc.);
enabled replications (physical or logical).

When replaying the same test, we need to ensure the consistency of the above-listed environmental parameters, since discrepancies in the listed aspects can affect the final numbers (for example, if the CPU in one environment differs from the CPU in another, direct comparison of tps results becomes meaningless). Therefore, in serious research, it is necessary to confirm that the test was replayed with the same environmental parameters.

The need for ready-made solutions and automation

To conduct load testing and collect the listed system information, you can use a set of open-source utilities, but practically all of them require manual integration and additional scripts. In other words, the administrator has to separately collect information about the configuration of Tantor Postgres or another PostgreSQL-based DB, network settings, operating system kernel version, etc., and then generate a unified report, often generated in an arbitrary format, which makes it challenging to compare this report with colleagues' data.

In order for all necessary data to be assembled and documented automatically, adds-on to standard tools are required. This is the issue pg_perfbench is designed to solve, as it allows:

running standard or custom pgbench scripts:
simultaneously collecting metadata about hardware, OS, and PostgreSQL configuration;
automatically generating output report per specified template.

As a result, when conducting stress tests, you can be sure that no crucial detail related to the environment will be missed. This report simplifies comparison of results, saves time on configuring repeated runs, and ensures a high level of reproducibility of experiments.

Automation of DB performance test

The overview of pg_perfbench operation during load testing:

The process of load testing and performance analysis of PostgreSQL is significantly simplified if in the end we receive a full structured report containing not only numerical indicators (tps, Latency average, etc.), but also detailed information about the environment. Such a report helps:

To compare the results of different runs and configuration versions without losing context.
To store all essential information about the server, PostgreSQL parameters, and network settings in one place.
To replay test scenarios, with ready-to-use template containing all the crucial data.

There are several core sections in the generated report:

System properties - describes the characteristics of the hardware part (processor, memory, disks, network settings).
Common database info - contains information about the PostgreSQL version, enabled extensions, and key database parameters.
Common benchmark info - specifies pgbench run settings (number of clients, testing time, used commands).
Tests result info - finalizes the results of executed tests, displays performance indicators (TPS, delays, etc.).

Collapsed report sections:

The formats used to generate allow easy reading and visualization of test results: JSON for storing data (including measurement results and configuration) and HTML for visualization of the report allowing to add links, styles, charts, etc. In our implementation, the HTML report contains a nested JSON object with filled data, used to visualize the report. Thus, the final process may look like:

Running test scenarios and collecting information.
Automatic report generation: a JSON file with a detailed structure and sections is created, and an HTML version of the report is generated based on the same template.
Analysis of results: required values are taken from JSON to build charts, comparison tables, etc., and the HTML report visualizes the testing results in a convenient interactive form.

Each section has parameters for structuring the interactive HTML report.

Section with a description of the server environment may include a table with information on the processor, memory, disks, as well as the output of some system commands. Expanded section contains elements (collapsed):

Section with a description of the DB configuration may contain postgresql.conf parameters, list of extensions, etc.

Collecting data on the DB environment

To collect server information, commands are usually executed in the terminal, for example:

df -h   # <---- disk space
 
cat /etc/fstab # <---- description of file systems
 
cat /etc/os-release # <---- description of the Linux distribution
 
uname -r -m # <---- kernel version

To collect information about the DB environment, you can use shell commands or SQL commands that are stored in separate files. The description of the report item indicates a file with these commands, and their results are included in the data field:

Let's look at the report section with data on the server environment filled:

Each item of the report structure has a description for interactive layout in HTML, which includes the following parameters:

header - the specified name of the section;
description - a description or a brief note;
state - the state of the report item (collapsed or expanded);
item_type - type of the returned script result (table or text);
shell_comand_file or sql_command_file - name of the shell or SQL script;
data - the field where the result of the script execution is recorded.

Information retrieval from the database is structured similarly:

Configuration of the database load testing

pgbench is configured in pg_perfbench user parameters:

python -m pg_perfbench --mode=benchmark \
--log-level=debug   \
--connection-type=ssh   \
--ssh-port=22   \
--ssh-key=/key/p_key \
--ssh-host=10.100.100.100    \
--remote-pg-host=127.0.0.1  \
--remote-pg-port=5432   \
--pg-host=127.0.0.1 \
--pg-port=5439  \
--pg-user=postgres  \
--pg-password=pswd  \
--pg-database=tdb   \
--pg-data-path=/var/lib/postgresql/16/main  \
--pg-bin-path=/usr/lib/postgresql/16/bin    \
--benchmark-type=default    \
--pgbench-clients=1,10,20,50,100    \
--pgbench-path=/usr/bin/pgbench \
--psql-path=/usr/bin/psql   \
--init-command="ARG_PGBENCH_PATH -i --scale=10 --foreign-keys -p ARG_PG_PORT -h ARG_PG_HOST -U postgres ARG_PG_DATABASE"    \
--workload-command="ARG_PGBENCH_PATH -p ARG_PG_PORT -h ARG_PG_HOST -U ARG_PG_USER ARG_PG_DATABASE -c ARG_PGBENCH_CLIENTS -j 10 -T 10 --no-vacuum"

All settings are recorded in the report for a full description of the tool startup. The number of iterations is also set by the user: you can specify either --pgbench-clients (number of clients) or --pgbench-time (duration) for each test run.

Configuration of pg_perfbench user parameters is described in the documentation.

pgbench launch iterations will be displayed in the benchmark description section:

Since standard pgbench load queries are used, initialization and loading scripts will be displayed as pgbench commands:

Result of database load testing

pgbench data is conveniently structures as tables, where each row is one iteration, and the columns correspond to the following key metrics:

After that, on the chart, you can compare iterations by TPS (vertical axis - TPS, horizontal axis - number of clients):

Eventually, automating of collection of all required data and converting it into a convenient form (JSON + HTML, charts, tables) significantly simplifies not only analysis but also co-working on the project. You can reproduce tests quicker, compare results, and identify bottlenecks in PostgreSQL performance without wasting time on investigating what exactly was tested and under what conditions.

Comparison of reports

When testing and analyzing PostgreSQL performance, we often need to compare several results of load tests. This can be applicable for the following scenarios:

comparison of the behavior of the same database on different servers;
checking how changes in postgresql.conf affect the final throughput (TPS) and latency.

pg_perfbench provides a mechanism for comparing ready reports in --mode=join mode. The the core purpose of its work is to:

Make sure that we are actually comparing corresponding environments (identical PostgreSQL versions, similar system parameters, the same set of extensions, etc.).
After checking the key elements for matching, a single "consolidated" report is formed, where the difference in settings and benchmark results can be seen.

A JSON file with keys used to compare data from different reports may have the following structure:

{
      "description": <Description of the current task>,
       "items":
              [
               "sections.system.reports.sysctl_vm.data",
               "sections.system.reports.total_ram.data",               
               "sections.system.reports.cpu_info.data",
               "sections.system.reports.etc_fstab.data",
               "sections.db.reports.version_major.data",
               "sections.db.reports.pg_available_extensions.data",
               "sections.db.reports.pg_config.data",
               "sections.benchmark.reports.options.data",
               "sections.benchmark.reports.custom_tables.data",
               "sections.benchmark.reports.custom_workload.data"
               ]
}

The "items" field contains an array of object fields representing the structure of the report. Equality of values will be checked based on these values. If it turns out that some of the compared mandatory items differ (for example, CPU on servers differ, or different versions of PostgreSQL are used), the utility will interrupt the join and display an error message. This protects against incorrect comparisons when the results simply do not make sense to compare directly.

Example log:

2025-03-28 16:31:02,285 INFO  root : 218 - Loaded 2 report(s): benchmark-ssh-custom-config.json, benchmark-ssh-default-config.json
2025-03-28 16:31:02,286 ERROR root : 190 - Comparison failed: Unlisted mismatch in 'cpu_info'
 reference report - benchmark-ssh-custom-config
 comparable report - benchmark-ssh-default-config
2025-03-28 16:31:02,286 ERROR root : 222 - Merge of reports failed.
2025-03-28 16:31:02,286 ERROR root : 317 - Emergency program termination. No report has been generated.

Comparison of DB performance with different settings in postgresql.conf

One of the most common tasks is to test the effect of changing PostgreSQL settings. Let's review a simple scenario when we have two configuration files:

postgresql_1.conf:

.....
shared_buffers = 166MB
work_mem = 10000kB
maintenance_work_mem = 20MB
effective_cache_size = 243MB

postgresql_2.conf:

.....
shared_buffers = 90MB
work_mem = 5000kB
maintenance_work_mem = 10MB
effective_cache_size = 150MB

The goal is to understand how the throughput capacity changes when the configuration parameters are changed. You need to run pg_perfbench with these two configurations, as described above (each run of measurements generates a separate report). Read more on configuration of user parameters in the documentation.

To join two reports, specify JSON with a list of "items" that do not include pg_settings (pg_perfbench/join_tasks/task_compare_dbs_on_single_host.json).

{
    "description": "Comparison of database performance across different configurations in the same environment using the same PostgreSQL version",
    "items": [
        "sections.system.reports.sysctl_vm.data",
        "sections.system.reports.sysctl_net_ipv4_tcp.data",
        "sections.system.reports.sysctl_net_ipv4_udp.data",
        "sections.system.reports.total_ram.data",
        "sections.system.reports.cpu_info.data",
        "sections.system.reports.etc_fstab.data",
        "sections.db.reports.version_major.data",
        "sections.db.reports.pg_available_extensions.data",
        "sections.db.reports.pg_config.data",
        "sections.benchmark.reports.options.data",
        "sections.benchmark.reports.custom_tables.data",
        "sections.benchmark.reports.custom_workload.data"
    ]
}

Now when running

python -m pg_perfbench --mode=join \
--report-name=join-diff-conf-reports \
--join-task=task_compare_dbs_on_single_host.json \
--input-dir=/pg_perfbench/report

the utility will check for compliance of the specified parameters: identical CPU, identical amount of RAM (at least the one documented in the reports), the same major version of PostgreSQL, matching set of extensions, etc. If the parameters comply, the utility will generate a consolidated report, and if any discrepancies are revealed in the listed items, it will output an error message. In the process log there will be a list of loaded reports and the result of the join:

python -m pg_perfbench --mode=join  \
--report-name=join-diff-conf-reports    \
--join-task=task_compare_dbs_on_single_host.json    \
--input-dir=/pg_perfbench/report
2025-03-28 16:53:31,476       INFO                                root :   55 - Logging level: info
.....
#-----------------------------------
2025-03-28 16:53:31,476       INFO                                root :  211 - Compare items 'task_compare_dbs_on_single_host.json' loaded successfully:
sections.system.reports.sysctl_vm.data
sections.system.reports.sysctl_net_ipv4_tcp.data
sections.system.reports.sysctl_net_ipv4_udp.data
sections.system.reports.total_ram.data
sections.system.reports.cpu_info.data
sections.system.reports.etc_fstab.data
sections.db.reports.version_major.data
sections.db.reports.pg_available_extensions.data
sections.db.reports.pg_config.data
sections.benchmark.reports.options.data
sections.benchmark.reports.custom_tables.data
sections.benchmark.reports.custom_workload.data
.....
2025-03-28 16:53:31,481       INFO                                root :   99 - The report is saved in the 'report' folder
2025-03-28 16:53:31,481       INFO                                root :  322 - Benchmark report saved successfully.

Join-report will contain:

List of the compared reports and join parameters:

system data matches at least for the items listed in the join list. If there are elements not specified in the join list, and there are differences in the data field in the compared reports, then in the final Join report, the tab for such an item will be marked with a special color and with "Diff". In the tab itself, the results for the corresponding items of all compared reports will be shown.

DB data matches by join parameters.

Configuration of pgbench load. Differences in the configurations of user parameters are in different configurations (postgresql.conf) of the database reports.

results section with performance difference across benchmark reports.

On the chart, you can quickly assess the difference in benchmarks by tps.

Conclusion

The use of the pg_perfbench allows:

To avoid missing environment info: all important hardware parameters (CPU, RAM, disks, network settings) and configurations of Tantor Postgres or other PostgreSQL-based DB (version, extensions, main and secondary parameters, key utilities) are automatically collected during testing, which eliminates the risk of working with incomplete data during analysis.
To reduce the time spent on preparing and replaying tests: you no longer need to manually describe every detail of the test environment, and then repeat all manual steps. Load parameters (number of clients, test time, pgbench commands) and key system parameters are automatically recorded in one report.
To simplify analysis and comparison of results: information is collected in a structured format (JSON and/or HTML) that is easy to navigate. Visual representation (charts, tables) allows you to quickly assess the dynamics of indicators (tps), and the built-in join functionality makes it easier to compare results of several reports.
To ensure reproducibility of experiments: if necessary, you can go back to previous measurements and re-run them under the same conditions without having to manually restore all parameters - all important settings and environment information are already included in the report.
To standardize the approach to load testing: pg_perfbench provides a unified interface to connect to different types of environments (local machine, SSH, Docker) and integrate with pgbench. This increases flexibility and provides centralized management of the testing process.

Thus, pg_perfbench makes PostgreSQL load testing transparent, reliable, and reproducible, significantly simplifying the analysis of results, rerunning of scenarios, and comparing them.

You can find the reviewed project at https://github.com/TantorLabs/pg_perfbench