One of the open challenges in the database world is keeping a database consistent across multiple DBMS instances (nodes) that independently handle client connections. The crux of the issue is ensuring that if one node fails, the others keep running smoothly — accepting connections, committing transactions, and maintaining consistency without a hitch. Think of it like a single DBMS instance staying operational despite a faulty RAM stick or intermittent access to multiple CPU cores.
My name is Andrey Lepikhov, and I’d like to kick off a discussion about the multi-master concept in PostgreSQL: its practical value, feasibility, and the tech stack needed to make it happen. By framing the problem more narrowly, we might find a solution that’s genuinely useful for the industry.

I worked on multi-master setups for years until it became clear that the essentially consistent multi-master approach hit a dead end. Now, after a long break and a switch to a new company, I’m taking another stab at the PostgreSQL multi-master idea to finally close this chapter and find a practical use for it (or, who knows, prove it’s a pipe dream).
First, let’s figure out the general model for its use — how multi-master can actually deliver value. Any technology is a trade-off between capabilities and needs, so let’s try to find that sweet spot for multi-master.
Typically, clients turn to multi-master when they hit the ceiling on connection limits in OLTP workloads. They’ve got tons of clients, a system handling N transactions per second (TPS), and a single database. Their idea? Add another server, set up active-active replication, and double the TPS capacity.

Sometimes, clients run a distributed app and want stable, low-latency database connections across different geographic regions. Other times, they’re after reliable automatic failover. Less common (but a nice bonus) is enabling online upgrades or smoothly taking a node offline for maintenance. Some even want an extra master with warmed-up caches and a near-production storage state to use as a test environment. Plenty of use cases, but which ones can we realistically deliver?
Keep in mind that active-active replication in PostgreSQL currently relies on logical replication, which means network latency, extra server load from decoding, WAL senders, and so on. Network delays kick in because we need to wait for confirmation that a transaction was successfully applied on the remote node, right? So, the idea of scaling write-heavy workloads with full replication hits a wall: each server has to write not only its own changes but also those from other instances (see the diagram above). This isn’t a big deal for hefty transactions with complex SELECTs, but multi-master demand usually comes from OLTP clients with simple DML queries.
We run into a similar issue when trying to ensure a distributed app has a stable connection to a nearby database. If the app’s connection to a remote database is shaky, the connection between DBMS instances will likely be just as unreliable. Waiting for confirmation of a successful commit from a remote node can take ages.
Then there’s the messy issue of distributed conflicts. When an update from replication tries to overwrite the same table row as a local update, there’s no guarantee that the transaction snapshots causing these conflicts are aligned across instances. So, whose change wins, and whose gets rolled back? Should one row update, within the transaction’s logic, know about a competing change to stay consistent?
Automatic failover is simpler but still requires some clever engineering to stay efficient. If all instances can write, committing a transaction means waiting for confirmation from every instance. Otherwise, if nodegoes down, some of its transactions might make it to node
but fail to reach (or get rolled back on) node
. How do you fix that? Send the whole setup into recovery?
So, the dream of a perfect multi-master looks shaky, especially for those chasing OLTP performance boosts. What’s it good for, then? Let’s start with the foundation: logical replication.
Logical replication has two big strengths. First, it lets you replicate data selectively, picking specific tables and even setting filters to skip certain records or transactions right at the decoding stage. This gives you a precise tool to sync only the data that matters. Second, it’s high-level — operating at the relational algebra level, abstracting away physical storage details.
What does this abstraction buy you? For example, you can have different index sets on synced nodes, cutting DML overhead locally and routing queries to where they run most efficiently: one instance handles short, precise UPDATEs/DELETEs by primary key, while another tackles DML with complex subqueries or non-conflicting INSERTs. You could even use standard PostgreSQL heap storage on one node and columnar storage on another. Heck, you might mix different DBMSs as multi-master nodes — your imagination is only limited by the replication protocol!
With logical replication’s advantages in mind, let’s picture a use case where multi-master shines.
For now, let’s set aside upgrades, maintenance, and failover. The most obvious use case is supporting a geo-distributed app. You can gain an edge by classifying database data into critical shared data, shared data written on one side only, and purely local data (see diagram).

Here, red marks data that must be reliably synced across instances, green and blue are data that don’t need immediate syncing (often read-only for the other side), and gray is strictly local data.
By designing the database schema to classify data by replication method, you can even shrink the database size on a given instance by not sending local data to remote nodes. Most data can replicate one-way asynchronously, avoiding the overhead of waiting for commit confirmations. Only critical data needs strict synchronization, using mechanisms like synchronous commit, two-phase commit (2PC), and isolation levels at least REPEATABLE READ. This slows down transaction commits and raises the risk of rollbacks due to conflicts.
What’s a real-world example? I haven’t deployed this at a client, so bear with my hypothetical. Picture an international company where employee data and financial metrics must stay on servers in each country (a common requirement these days). For analytics, this data can be read externally—a case for filtering replicated data by key values. You could split an employee table, syncing names, roles, and salaries across all database instances while keeping sensitive identifiers like social security or passport numbers in a local table.
If the database is designed so most workload hits local or asynchronously replicated data, you might even achieve that coveted write scaling (sounds heretical, but who knows).
From my days in aerospace, I learned to evaluate effects qualitatively. Let’s estimate what percentage of the database can use active-active replication without tanking performance. For simplicity, imagine two company branches on different continents with two setups: (1) one server, and (2) two servers in multi-master mode, where access is always local (see diagram below).

Let’s define some terms (also shown in the diagram):
— time (in ms) to execute a transaction in the DBMS backend.
— network round-trip time (in ms) for an app near the DBMS instance.
— network round-trip time (in ms) for an app on a different continent from the DBMS.
— time (in ms) to get confirmation of a successful commit from a remote DBMS instance.
— fraction of local connections.
N — fraction of DML transactions that can use asynchronous replication.
For a single server:
For multi-master:
Now, let’s plug in some numbers. Assume 50% local connections (). Based on my experience living in Asia and connecting to distant resources, let’s use:
.
Here, and
are one round-trip, but for remote commit confirmation (
), assume two round-trips: in 2PC, you first run PREPARE STATEMENT, waiting for replication and resource reservation, then send COMMIT.
With these numbers:
Now, what if 80% of connections are remote?
What if we enforce full synchronous 2PC for the entire database? Let's calculate: performance loss in the best case. Not inspiring, but maybe good enough for some?
This rough math suggests that if about 25% of DML transactions need remote confirmation, multi-master won’t lose performance. If most traffic comes from remote regions, up to 40% of data can be reliably replicated, but let’s be conservative and aim for N = 25%. This also cuts disk, lock, and other resource usage in half for local operations like vacuum or read-only queries. Sounds like there’s something promising here, right?
The flip side? Even asynchronous replication needs to keep up with the commit stream. If a local transaction takes 15 ms and a one-way delay to a remote server is 75 ms, without waiting for confirmation, the replication queue will pile up.
With 25% of DML needing remote confirmation, 75% is asynchronous: . To bridge the gap between local commit speed and data transfer to the remote server, you need bandwidth: send and apply data in parallel (i.e., parallel replication). In our rough model, four parallel streams should do it. With resources freed up by spreading connections across instances, this looks doable.
In summary, geo-distributing data in multi-master mode could theoretically match single-server TPS while reducing backends and resource use per server. Those resources can fuel system processes or analytics. Plus, you can optimize indexes, storage, and data placement per node. A bonus? If the network fails, each subnet can keep running temporarily, with conflict resolution strategies restoring database integrity later.
Imagine a use case for a full network breakdown: a hospital network database in Southeast Asia, where terrain and weather make connectivity spotty. A shared medical records database is needed, but patients rarely visit multiple hospitals in a short time, minimizing conflicts in critical data.
Given all this, a practical multi-master should include:
* Replication sets to classify data for synchronous or asynchronous replication.
* Guaranteed commit similar to 2PC.
* Distributed consensus protocol for identifying working node subsets and fencing failed nodes in 3+ node setups.
* Parallel replication — parallelizing DML sending and application on remote nodes.
* Automatic conflict resolution for autonomous operation during network issues.
Don’t forget automatic failover and hot-swappable hardware without downtime. Alternative physical storage organization feels a bit sci-fi for now, so I’ll leave it aside. But if multi-master takes off, maybe it’ll become a core feature?
Let’s see how current multi-master projects stack up:
Technology | |||
Replication sets | + | + | - |
Guaranteed Commit | - | +/- (recovery slots?) | + |
Concensus protocol, fencing | - | - | + |
Parallel Replication | - | + | +/- |
Automatic Conflict Resolution | - | + | Not required |
Curiously, no project fully covers our ideal tech stack. Can any of these projects architecturally support all the needed multi-master features?
That’s it for now. This post is meant to spark discussion, so feel free to share your thoughts in the comments or any other way you prefer.
Links
THE END.
Spain, Madrid, October 15, 2025.