Shardman. A quick guide for the architect / Хабр

Once upon a time at Tandem Computers, pioneers of fault-tolerant systems, an unexpected incident occurred. The disappearance of the company’s only office cat caused a crash in the most critical component — the engineers’ mood. The solution reflected the corporate philosophy: they implemented N+1 redundancy in the form of two brand-new cats at once.

Unfortunately, in the world of large enterprise DBMSs, scaling and fault tolerance problems aren’t solved quite that easily. “To shard or not to shard?” That’s a tough question every architect of a growing system faces sooner or later.

Imagine, that your application runs fine on PostgreSQL, but the data keeps growing, the load keeps increasing, and the classical optimization tricks (indexing, partitioning, replicas) don’t save you anymore. You start thinking about horizontal scaling, but then:

How do you split the data the right way?
How do you avoid performance losses on joins?
Will your fault-tolerant DBMS turn into a giant headache because of management complexity?

In the previous article we got acquainted with Postgres Pro Shardman — a distributed DBMS built on Postgres Pro, still compatible with the relational model and ACID transactions. Today, let’s talk about the practical side:

When is sharding justified, and when can you live without it?
How do you design a data schema for a distributed DBMS like Shardman?
What tradeoffs do you face with distributed transactions and queries?

Is it hype, or an inevitable future?

Start your morning with a healthy, tasty — finger-licking good — and nutritious Ubik porridge. It’s the best breakfast choice for a grown-up. Never exceed the recommended dose.
“Ubik”, Philip K. Dick, 1969

A little over ten years ago, right in the middle of the NoSQL epidemic, some DBMSs started gaining traction — ones that promised horizontal scalability (the demand of the time), but at the same time offered proper ACID (or at least parts of it) and that cozy old SQL onboard. This was pretty much the start of the Distributed SQL and NewSQL era.

Today, it’s no longer hype — it’s tough reality. New solutions keep popping up and slowly, day by day, are swallowing not only hipster startups but also the entire world of enterprise data. In glossy marketing brochures, vendors of such DBMSs of course promise you “100,500 million RPS on top of petabytes of data.”

But making the leap is no easy decision. Doubts and questions pile up:

Does it really work in practice, or are those beautiful benchmarks hiding hacks, pain, and nasty pitfalls?
Should you chase after the shiny new trend, or keep tuning your trusty old relational database until it wheezes its last?
Maybe it’s just the natural path for any large growing DB, and sooner or later you’ll have to move to something distributed anyway?

The answers will always be individual: some projects will find new life with distributed solutions, others will crash into unexpected difficulties. In this article, we’ll try to unpack the topic to help clarify these dilemmas (or maybe add some new ones, which is valuable too). While the main focus will be our DBMS Shardman, many aspects will be useful for other distributed systems as well. Hopefully, this helps a wide range of specialists.

So, where do we start? Let’s figure out in which situations your DB really needs sharding/distribution, or at least when it’s worth considering.

When sharding becomes necessary

When vertical scaling hits the ceiling
Your DB might be handling massive join operations across huge tables, with business logic pushed into SQL that chews up CPU cycles. Sure, you can keep throwing more iron at it, but hardware upgrades only get you so far. Even the beefiest servers have physical limits, unlike the endless imagination of your analysts and product people. Modern CPUs rarely go beyond 128 cores, and performance doesn’t grow linearly with the core count — synchronization overhead, workload quirks, and SQL execution specifics all chip away at scaling efficiency. On top of that, high-end gear gets exponentially more expensive, while horizontal scaling on commodity hardware often becomes more cost-effective once you pass the mid-range.
When the working set just keeps ballooning
Technically, PostgreSQL can handle hundreds of terabytes (yes, there’s a 32 TB limit per table, but partitioning solves that). What really matters, though, is the load profile. In production, data doesn’t just sit there—typical workloads hit the system with thousands of diverse reads and writes. Every write kicks off background housekeeping (vacuum/analyze), which eats resources and competes with user queries. The key metric here is growth velocity: if your dataset is snowballing and maintenance tasks like crash recovery or replication are taking longer and longer, that’s a red flag telling you it’s time to rethink the architecture.
When connection counts go wild
Postgres uses the “one process per connection” model, which starts to creak under heavy load. Once you cross ~10k active connections, overhead shoots up. This is especially painful in SOA or CQRS setups with chatty services, or in systems holding tons of long-lived connections (think WebSockets, push notifications).

Now, let’s assume you’ve already tried — or ruled out — all the usual tricks to ease database load:

Splitting services into separate databases. It caps growth in the main DB, but you lose cross-DB transactional integrity, reporting gets messy, and distributed transactions are a headache to implement.
Filesystem compression. Saving disk space this way often backfires: the extra CPU work for compressing/decompressing can slow queries and cancel out the benefits.
Connection pooling. Popular, but not a silver bullet. With lots of long transactions, pools can become bottlenecks, especially in SOA with hundreds of services.
Hot/cold data archiving. In theory, you move infrequently used data into cold storage. In practice, frequent queries to the archive, regulatory requirements, and the struggle to keep hot/cold layers consistent often ruin the idea.
Read replicas. Handy until your app starts choking on replication lag — common in write-heavy environments like finance or real-time systems.
Switching to NoSQL. Tempting, but brutal. You’d need to redesign your data model, migrate everything, rewrite application logic, retrain the team, and deal with the organizational inertia of a company already all-in on Postgres.

If most of these points sound painfully familiar, then yeah — it might be time to break up your massive monolithic DB into shards and get transparent scalability (though let’s be honest, you’ve probably been thinking about it already).

From here, we’ll dive into the mess of challenges you’ll face moving to a distributed DBMS — focusing on Shardman in particular — and then weigh up whether all the pain was worth it in the end.

Types of sharding

There are two main approaches to distributing your data across multiple instances: schema-based (splitting by schemas) and row-based (splitting by rows). Let’s go through their principles and when to apply them.

Splitting a monolithic database into several fully independent databases, or schema-based sharding, is one of the oldest and most radical approaches. Each shard becomes a separate database instance with its own schema, code, data, and routines: backups, tuning, fault tolerance, and all the rest.

Is it transparent? Yes.

Is it easy to implement? Not quite.

It’s not always possible to cleanly separate entities without running into imbalances. On top of that, the application has to be heavily reworked: it must know exactly which shard to query for which data. Need a JOIN across shards? Be ready to write complex logic at the application level. Need consistency? You’ll have to build it yourself — there are no built-in mechanisms. Rebalancing, uneven load distribution, sudden hot spots — all that responsibility falls on you. If one shard starts burning under load, the others won’t help (although Citus, for example, offers transparent rebalancing even for schema-based setups).

At the very least, this approach works well in SOA architectures, where data is naturally divided between services. The key is that there should be no need for cross-shard operations, and aggregated analytics/reporting either isn’t required or is collected separately. Schema-based sharding gives you full control, but demands constant attention in return. If you want maximum predictability and don’t mind manual management, it can work fine. But if you’re after “transparent” data distribution, it’s better to look at row-based sharding.

Row-based sharding is a method where rows of a single table are distributed across shards by a given key (for example, a hash of selected column values — the sharding key). This approach allows scaling writes and reads through parallel processing on different servers, but it requires special database features to handle cross-shard queries, consistency, and so on. In this case, the application still sees the database as a single whole, while the DBMS hides the complexity of cross-shard operations and ensures transactional integrity and strict consistency of the data. The system abstracts away the distributed nature of the data, offering a unified, fully relational interface.

This option makes sense when schema-based sharding is impossible or impractical. Typical scenarios include working with moderately normalized data schemas, complex ETL processes, or datasets containing multiple groups of entities with intricate interrelations. A critical prerequisite is the ability to choose an effective sharding key — without it, performance may degrade significantly, possibly becoming even worse than in a monolithic setup. We’ll discuss this issue in detail in the following chapters.

The main advantage is that you keep the familiar relational model. Administration, backups, and consistency management remain centralized, which greatly simplifies operations. At the same time, the system can work with data spread across shards without requiring the application to implement complex logic. But there’s a flip side: you need to migrate schema, data, and stored code into a distributed system. Applications will also need modifications — less drastic than with schema-based, but still necessary. Moreover, running a distributed DBMS requires significantly higher engineering expertise compared to classic setups: optimizing distributed queries, fine-tuning network and hardware, and more. Ultimately, choosing this path means accepting a trade-off between migration complexity and horizontal scalability with centralized consistency control and easier development/operations.

Shardman is exactly such a distributed DBMS based on row-based sharding, and that’s what we’ll be talking about next.

The diagram below illustrates how two tables might be sharded across two nodes using both methods.

First steps

Everyone’s always hunting for what I call the magical initialization parameter fast=true. That magic switch that would make their database fly. The sooner you come to terms with the fact that it doesn’t exist, the better off you are.
“Oracle for Professionals,” Tom Kyte, 2003

Twenty-plus years later, this advice still holds true for most distributed DBMSs. Just slapping a hypothetical distributed=true flag (if it exists anywhere) won’t magically give you a scalable, high-performance system — in fact, it can even backfire. Even if such a parameter existed, it wouldn’t replace the need for building a complex infrastructure, carefully tuned and constantly maintained. Sure, cloud providers offer “click-to-scale” solutions, but they’re not a silver bullet — technical, economic, and other limitations apply. And most importantly, a deep rework of both your DB and application architecture is inevitable.

Big corporate databases often resemble what I like to call “relational junkyards”: sprawling systems with thousands of tables and objects, tangled relationships, overlapping business domains, business logic smeared across layers, all wrapped in thick, uneven layers of technical debt and “historical accidents.” If your monolithic DB has technical debt or kludges, moving to a distributed DBMS will multiply their cost. That’s why pre-refactoring is essential. Just like a patient gets examined and treated before a major procedure, your database needs a cleanup before a complex migration to a distributed DBMS.

Not all architectural decisions are equally harmless. Some approaches aren’t just suboptimal — they’re red flags, warning you of potential trouble ahead. That doesn’t mean you should throw everything out and start over, but these points deserve extra attention during system analysis. Things to watch out for:

Normal forms — great, but moderation is key. Over-normalization (think Anchor or Vault Modeling, where you join 10 tables just to get a contractor’s address) can complicate query plans and make optimization a headache.
Healthy denormalization isn’t such a sin. In fact, it often becomes a necessity when defining sharding keys.
Cyclic relationships between tables are common in large DBMSs, sometimes forming complex, interwoven chains across dozens of tables. For reference tables, that’s usually fine. In sharded tables, however, cyclic dependencies are a nightmare. Tables with different sharding keys that get tangled together will be forced into costly cross-shard interactions.
EAV (Entity-Attribute-Value) models — powerful but dangerous. They provide flexibility at the expense of complex, hard-to-read SQL, performance bottlenecks, and multi-level joins. Storing all attributes for all entities in one column of one table (VALUES-style) kills any chance of meaningful domain sharding, because different entities need different distribution keys. Those spaghetti queries typical in EAV models will perform even worse in a distributed environment than they do in a monolithic DB.
Sharding large key-value stores often doesn’t pay off. Limited choice of sharding key (usually just the record key) combined with variable value sizes can lead to load imbalance: some shards choke under heavy traffic while others stay underutilized, completely negating the benefits of horizontal scaling.

New types of database objects

When it comes to row-based sharding (as opposed to schema-based), systems usually have to introduce new object types — even if the underlying DBMS is mature and well-established. Shardman is no exception.

Sharded table — on the surface, it looks like a normal Postgres partitioned table. But each node only stores a subset of the partitions, while the rest are accessible over the network as foreign tables. This way, data is distributed across the cluster but still fully accessible from any node. When you add or remove nodes, partitions are rebalanced automatically. Rows are distributed to partitions based on a hash of the sharding key specified when the table is created. This approach enables controlled horizontal scalability, but picking the right sharding key is critical.

When to use sharded tables?

Large data volumes. For example, when background processes like vacuum/analyze start consuming huge resources, crippling the system. Choosing or creating a good sharding key that evenly distributes and processes data across nodes frees up resources for current and future loads.
High write load. When incoming data streams are too heavy to be inserted and processed within the available time window. Especially in systems with resource-intensive business logic that can be efficiently parallelized across shards.
Intensive reads. When reading activity increases response times beyond business requirements. If most read queries can leverage the sharding key, cross-shard joins are minimized, and the system is significantly relieved.

Colocated table — also a partitioned table, but “tied” to a parent table so that all rows of each partition with the same sharding key physically reside on the same nodes as the parent. This allows heavy JOIN operations by partition key with minimal overhead, since the data is co-located. Colocated tables can be sharded not only by foreign key but by any arbitrary column(s).

When to use:

Frequent joins on the key make sense if the JOIN runs on the column used for colocation, keeping query performance similar to a monolithic DB.
Star and snowflake schemas benefit when colocation reduces network overhead for joins between fact and dimension tables.

Global table — a table whose complete physical copy exists on every shard, allowing local reads without inter-node queries.

Key features:

Full replication: data is synchronized across all nodes; any read operation is local.
Zero network overhead for SELECTs: no need to reach other shards, which is critical for frequent point queries.
Not just reference tables: any table can be global if update frequency is low.
Updates require synchronization: every INSERT/UPDATE/DELETE is applied on all nodes, which increases load proportionally.

When to use:

Frequent reads, infrequent writes, e.g., reference tables.
Latency-sensitive queries: data is always local, so response times are minimal.
JOINs with small tables: replicating a small table across nodes accelerates queries.

The image below illustrates data distribution for different table types across shards. Clients is a standard sharded table, holding part of the data on each node. Orders is a sharded colocated table with Clients (all orders of a client reside physically together). Payment methods is a classic reference table, global, with a full copy on every node.

Other global objects. Beyond tables, roles, tablespaces, and sequences can also be distributed. Once created on one node, they’re available cluster-wide, with extra “distributed superpowers.” For example, sequences can generate globally unique values independently across nodes.

Pseudo-global objects. Objects without a distributed twin can be broadcast by enabling the broadcast_ddl parameter. This creates them on all nodes, making them usable alongside distributed objects — for example, creating a new schema or user-defined type (which can be a column type or base type for a sharded table).

Regular objects. You can still create ordinary objects (like vanilla Postgres or Postgres Pro) without distributed magic. Shardman won’t manage them, so you’ll handle them manually. They also cannot participate in cross-shard transactions, which is a step back in operational capability, but sometimes necessary — e.g., as an intermediate step when migrating schema and data from a monolithic DB.

Unsupported objects. Some object types simply don’t fit a distributed model. In Shardman, temporary tables fall into this category. You can create and use them, but only until a modifying operation enters a distributed two-phase transaction.

Schema migration

Once you’re familiar with the basics of distributed DBMS architecture and the new types of distributed tables, it’s time to dive into schema migration. This step lays the groundwork for subsequent migrations — data, code, and all the rest. The more thorough you are here, the smoother the following steps will be, and vice versa.

Even though Shardman, like many modern distributed DBMSs, looks like a regular relational database on the surface (tables, keys, transactions), designing schemas for distributed systems is a whole different ballgame. Think of it like comparing a traditional bookstore to an online store with a network of pickup points scattered across cities and towns.

In a classic bookstore, all books are in one place, neatly organized by section, author, genre, and publication year. Want a specific book? You check the catalog, head to the right shelf, and grab it. The problem arises when the store becomes too popular: queues stretch outside, crowds jam the aisles, and staff can’t keep up. That’s your analogy for a classic relational DBMS.

An online bookstore works differently. Books are physically spread across warehouses and pickup points. On the site, they appear as one unified collection, but in reality, book “A” is in Moscow, “B” in Ufa, and “C” hasn’t even been printed yet (print-on-demand). The customer just selects a convenient pickup point and gets pre-delivered books, bypassing lines and chaos. Limitations exist: you can’t request “all sci-fi from 2024” or “all Stephen King short story collections” — you can only pick up what’s already delivered to that specific location. Choosing the wrong city means a trip to another pickup point, and a large order may require visiting multiple points if one can’t handle the full volume. This is a direct analogy to distributed DBMSs.

The first approach offers query flexibility; the second provides horizontal scalability, but with strict limitations on what operations are feasible. So, when designing a schema for a distributed DB, you need to understand in advance who will use the data, how, and when it will be cheap vs. expensive, and plan where and in what form data will reside. This is achieved by picking an appropriate sharding key.

Sharding key — a special data attribute (column or set of columns) that determines where records physically reside across shards. It critically impacts performance by minimizing cross-shard operations (or, if chosen poorly, increasing them). Sharding keys can be natural or surrogate.

Natural sharding key uses existing table columns, even if one of them is a surrogate primary key:

-- If the sharding key matches the PK, creating a sharded table
-- is almost identical to creating a regular table
create table persons (
  id serial primary key,
  name text,
  department_id int)
with (distributed_by = 'id', num_parts = 32);

-- Using a natural sharding key not in the PK (e.g., sharding by department)
-- Add it to the PK
create table persons (
  id serial,
  name text,
  department_id int,
  primary key (id, department_id))
with (distributed_by = 'department_id', num_parts = 32);

Surrogate sharding keys operate with special system columns (sometimes combined with ordinary columns) to finely control data routing. In our case, it’s a column skey populated by the application, not the DBMS:

-- Surrogate sharding key columns also need to be added to PK
create table persons (
  id serial,
  skey uuid not null,
  name text,
  primary key (id, skey),
  department_id int)
with (distributed_by = 'skey', num_parts = 32);

When migrating a monolithic schema to a distributed architecture (or designing from scratch), choosing the right sharding keys upfront is critical. Changing them later in production is expensive. Sometimes you can use existing columns, but more often you’ll need new fields and some denormalization to ensure related tables can share keys and data can reside on the same shard.

While picking a key for a single table may seem trivial, the real complexity comes from interdependencies among dozens (sometimes hundreds) of tables and hundreds of queries, each table playing different roles. Crafting a coherent distributed schema, especially when refactoring a legacy system with terabytes of data and thousands of users, is a non-trivial engineering task — one we’ll cover in a separate article.

Schema migration often entails not just pragmatic denormalization for sharding keys but general schema refactoring: fixing old “warts” that might not directly relate to distribution but make migration smoother. Why not take the opportunity? Schema refactors are rare, and the next chance could be years away.

For example: during a DB schema migration, a decision was made to eliminate several child tables and instead nest their data into parent tables (victims of over-normalization early in the project). Queries and code became much simpler. The key is not to go overboard and keep priorities clear.

Finally, a word on schema migration tools. Tools like Liquibase and Flyway are standard de-facto in the DB world. For SQL-based migrations in distributed DBMSs, standard DDL operations with additional directives like global or distributed_by remain the most reliable approach. Declarative XML/JSON migrations introduce challenges: abstractions rarely account for the distributed nature of the target DB, necessitating either deep customization or a painful switch to SQL migrations. The latter, however, has always been good engineering practice, especially for complex distributed systems.

Code Migration

Migrating to a distributed DBMS inevitably requires rewriting or adapting the code that interacts with the database — both stored procedures and external application code. In this section, we’ll cover the key aspects of adjusting your codebase for a sharded architecture, highlighting the most critical areas that need close attention.

Extensions — modern PostgreSQL without them? Hard to imagine. They add functionality, simplify working with data, and often become an integral part of the infrastructure. But in a distributed DBMS, almost every extension you use — built-in or third-party — may require a complete rethink. Some extensions will need to be disabled, others heavily modified, and some used only with caveats. And with each one, a question arises: “How will it behave in a distributed environment? Operate only on the local shard, on a remote shard, or affect the entire cluster?”

Take the classic uuid-ossp extension. In a regular PostgreSQL, UUID collisions are almost nonexistent. But what if a cluster has dozens of instances generating UUIDs simultaneously? Does the risk of duplicates increase? By how much? Or pgpro_scheduler — should it run tasks only on the current node, or distribute them across the cluster? If distributed, how: in parallel, in a rolling fashion, or randomly, while avoiding collisions?

Questions like these arise for nearly every extension. During the design phase of a distributed system, a full audit of all extensions is necessary, testing their behavior in a cluster and adjusting logic or contacting developers when needed.

Sequences — a foundational mechanism of relational DBs, and their role goes far beyond generating surrogate keys. In complex business logic, they may manage document numbering, versioning, or even time intervals, all thanks to guaranteed uniqueness and monotonicity. But in distributed systems, sequences become a headache. Ensuring cluster-wide uniqueness is critical but tricky. Traditional sequences tied to a single node become a bottleneck and a source of conflicts.

There are many ways to generate unique values in distributed systems — from classic UUIDs to ULID, Snowflake IDs, and hash-based approaches. In Shardman, this is solved via range allocation: each shard gets a range of values, and when it runs out, it requests a new one. This allows continued use of standard Postgres sequences, guarantees cluster-wide uniqueness, but breaks monotonicity. These compromises must be considered during migration:

Some queries may need rewriting if they rely on strict monotonicity.
Sorting strategies may require significant adjustments.
In extreme cases, the approach to ID generation may need a rethink — perhaps moving to UUIDs or hybrid solutions.

Read queries. Schema refactoring, including adding new columns, inevitably requires modifying existing queries. They may still work, but performance can drop compared to a monolithic DB, especially if WHERE clauses lack sharding key columns, triggering inefficient cross-shard scans.

For example, if the persons table is sharded by a surrogate key, simple queries by PK will still return correct results, but the DB may scan all table partitions, including remote ones. This is slower and resource-intensive:

select name
from persons
where id = 42;

Execution plan:

Append  ...
  ->  Async Foreign Scan on persons_1_fdw persons_2  ...
  ->  Async Foreign Scan on persons_2_fdw persons_3  ...
  ...
  ->  Async Foreign Scan on persons_31_fdw persons_32  ...

If you explicitly provide the sharding key, it reduces to a single Foreign Scan on the right node — or even a local read if the row resides on the coordinator node:

select name
from persons
where id = 42 and skey = 'cc00ffee-6dd6-44cf-8f40-0c306e308b3d';

Foreign Scan on persons_18_fdw persons  ...
  Foreign EXPLAIN:
    Index Scan using persons_18_pkey on persons_18  ...
      Index Cond: ((id = 42) AND (skey = 'cc00ffee-6dd6-44cf-8f40-0c306e308b3d'::uuid))

A particularly dangerous anti-pattern in distributed DBs is SELECT * (Bill Karwin’s “implicit columns”). While convenient, it leads to excessive data retrieval. ORMs especially love this, often loading all model columns by default. In monolithic systems, this increases network traffic; in distributed architectures, the effect multiplies across shards.

Write queries. Migrating writes requires major revision: all INSERTs, UPDATEs, and other statements must explicitly include sharding key columns, computed before execution. While technically possible to delegate this to the DB (e.g., via triggers), that puts a heavy load on the system. The preferred approach — though more work — is to modify application code to supply the full data set including sharding keys, which can complicate integration with external systems not designed for a distributed architecture.

Display/search/pagination issues. In DB parlance, seqscan (full table scan) is the devil incarnate — developers fight it everywhere. But distributed DBs can be worse: “seqscan of seqscans” occurs when queries scan all remote shards before filtering results at the coordinator. Even a simple SELECT * FROM table_foo WHERE PK = … can scan all partitions if the sharding key isn’t used, or if the WHERE condition is too complex/unpredictable (e.g., using mutable functions).

This becomes critical for UI-driven data display, such as paginated search results. Classic offset pagination (LIMIT X OFFSET Y) is deadly in distributed DBs — it triggers full shard scans and sorting at the coordinator. Keyset pagination (cursor-based) is far more efficient, but still requires awareness of the sharding key; otherwise, queries hit all nodes, negating the benefits.

DB-side business logic. Databases often contain logic from integrity checks to complex scenarios in stored procedures or triggers. Moving triggers to a distributed environment requires explicitly defining the execution point: on the coordinator (for fast pre-validation) or locally on each shard (for distributed load). Triggers that read/write other sharded tables are especially tricky, as they can trigger full cluster scans, turning a simple insert into a lengthy distributed transaction.

Integrity constraints gain special considerations. In Shardman, foreign keys between sharded tables only work if tables are co-located (sharding keys match), ensuring related rows reside on the same physical node. Otherwise, cross-shard checks would nullify horizontal scaling benefits.

ORMs. Classic ORMs have long propagated the myth that DBs are interchangeable. In distributed systems, these dreams crash against reality: queries must explicitly respect sharding keys and cross-shard limitations. ORMs tend to generate single-instance-style queries, ignoring data location. Developers must “train” the ORM: declare sharding columns, override query generation, and explicitly control execution points. Without this, apps either break or suffer catastrophic performance hits from inefficient cross-shard SQL.

A miracle can happen — if all sharding keys perfectly align with existing PKs — but the odds are extremely slim; usually this only happens in toy demo DBs, and even then, not always.

Data migration

Migrating hundreds of terabytes of data into a distributed DBMS requires careful preparation and a phased approach. Let’s break down the key steps based on real-world experience:

Preparation phase. Everything starts with defining a consistent system snapshot. Often, a fresh backup is ideal. At the same time, you need to select a CDC (Change Data Capture) solution to track changes in the source system. This choice is critical and depends directly on the source DBMS.
Transformation and loading phase (the “TL” part of ETL) usually takes one to two weeks. It’s rarely enough to simply dump old data into new tables — transformation is often required, for instance, populating new columns created for sharding keys. If you insert data directly into tables on the current node, the coordinator must parse the batches and route queries to the appropriate nodes, consuming additional resources. A better approach is to load data directly on each shard. Shardman provides a dedicated tool for this: shardmanctl load. During this period, changes continue to accumulate in the source system, requiring careful synchronization. It’s highly recommended to create a trunk version of the database compatible with the old schema — a safety net for rollback if needed.
Synchronization phase. After the initial load, actual changes are synced via the chosen CDC solution. Now Shardman becomes the target system. It’s essential to wait until all data matches exactly and carefully verify integrity across all systems: the new system, the old one, and the intermediate trunk version. Practice shows that many “reliable” verification methods aren’t perfect when examined closely.
Test switch phase. Roles are reversed: Shardman becomes the primary system, and the old system becomes secondary. The ideal setup is a “switch by button” mechanism with instant rollback capability — your last line of defense before full migration. Currently, Shardman does not include a built-in CDC solution; you must rely on vanilla replication via replication slots, which is admittedly inconvenient. Improvements are planned for future versions.
Final cutover. Disabling the old system seems simple on paper but is the real test. Ensure all processes are stable in the new environment before fully decommissioning the old system. This phased approach, though time-consuming, greatly reduces risk and prevents catastrophic failures. The key: don’t rush and verify every step.

Topology

In monolithic DBs, topology is straightforward: a single master server, a set of replicas, and established failover mechanisms. Distributed systems require a fundamentally different approach. Here, multiple equal shards exist, each with its own replicas, creating a complex network of interconnections. The main goal is to ensure coordinated operation, balancing availability, consistency, and fault tolerance through thoughtful architecture. Let’s consider common approaches:

Direct topology. A reliable setup for high availability. Each node (master + N standby replicas) is placed on separate physical servers. The key advantage: the cluster continues operating even if up to N-1 nodes of a shard fail.

Cross-topology. A budget-friendly but effective alternative. Replicas are placed on different machines than their masters, so a single node failure does not bring down the entire cluster. This setup maintains operational continuity while minimizing infrastructure costs.

Data Replication Service. For optimal fault tolerance, a dedicated replication service managing entire clusters can be used. The backup cluster’s topology must mirror production. This mechanism performs asynchronous WAL shipping to the remote cluster, enabling:

Multi-data center operation.
Moderate network bandwidth requirements.
Physical separation of mirrored clusters.

Conclusion

After all stages, we achieve a fully functional SQL database with ACID transactions and near-linear scalability — an ideal foundation for high-load systems. The complexity of migration pays off when you see these performance metrics.

In the first stage, we deployed a cluster with two master nodes without any replication. We then scaled the system up to three masters, achieving the expected performance boost through horizontal scaling — the results were entirely predictable.

Next, we introduced fault tolerance by adding a synchronous replica to each master. This setup gave us a zero RPO (Recovery Point Objective), but as the graphs show, it came with the expected drop in performance — the price for guaranteed data safety.

To validate our results, we repeated the tests with a configuration of four masters without replicas, and then six masters. In the final round, each master had a synchronous replica, achieving full fault tolerance: if any node fails, the system continues running without losing a single bit of data (RPO=0). The testing results clearly illustrate the cost of this level of reliability.

Configuration recommendations:

Determine the optimal number of shards for your workload.
Balance the number of replicas with fault-tolerance requirements.
Use powers of two for table partitioning (2^n), starting with two partitions per shard.
Avoid excessive partitioning, which overloads the query planner.

What’s next?

We’ve covered the key points of working with Shardman, but this is just the tip of the iceberg. The official documentation, for instance, provides detailed guidance on:

Distributed transactions.
Pushdown optimization in SQL queries.
Schema and data migration using a sample “Airline Operations” DB.

This article is based on Mikhail Sumsky’s talk “Architectural Practices for Using Shardman” at the Core Systems Dev Conf 2025.

Shardman. A quick guide for the architect