Database Administration *

Everything about database administration

ArticlesPostsNewsAuthors

melanny20 Dec 16 at 11:04

Breaking data for fun

Easy

8 min

3.1K

Postgres Professional corporate blogSystem administration * Database Administration * SQL * PostgreSQL *

Tutorial

Translation

Throughout their careers engineers build systems that protect data and guard it against corruption. But what if the right approach is the opposite: deliberately corrupting data, generating it out of thin air, and creating forgeries indistinguishable from the real thing?

Maksim Gramin, systems analyst at Postgres Professional, explains why creating fake data is a critical skill for testing, security, and development — and how to do it properly without turning your database into a junkyard of “John Smith” entries.

melanny20 Nov 25 at 14:40

Write. Review. Commit. Repeat. Behind the scenes of Postgres Professional docs

Easy

3 min

6.2K

Postgres Professional corporate blogPostgreSQL * Technical Writing * Database Administration *

Review

Translation

Everyone knows great documentation makes or breaks a tech product — but few realize how much work goes into it. At Postgres Professional, the docs are written with the same discipline as the code. What’s even more impressive, all of it is done by a team of just ten people. We talked to senior technical writer Ekaterina Gololobova to see how it really works — from the first task to the final commit.

melanny20 Nov 19 at 09:32

PostgreSQL multi-master: a pipe dream or a practical solution?

Medium

7 min

6.1K

Postgres Professional corporate blogPostgreSQL * Database Administration * Server Administration *

Opinion

Translation

One of the open challenges in the database world is keeping a database consistent across multiple DBMS instances (nodes) that independently handle client connections. The crux of the issue is ensuring that if one node fails, the others keep running smoothly — accepting connections, committing transactions, and maintaining consistency without a hitch. Think of it like a single DBMS instance staying operational despite a faulty RAM stick or intermittent access to multiple CPU cores.

My name is Andrey Lepikhov, and I’d like to kick off a discussion about the multi-master concept in PostgreSQL: its practical value, feasibility, and the tech stack needed to make it happen. By framing the problem more narrowly, we might find a solution that’s genuinely useful for the industry.

melanny20 Oct 24 at 13:45

How we boosted SQL query accuracy by 33% with LLMs

Medium

8 min

11K

Postgres Professional corporate blogPostgreSQL * Server Administration * Database Administration * Popular science

Tutorial

Translation

Traditional approaches to SQL query generation often rely on instruction-tuned language models, but these can be inefficient and inaccurate. In this article, we’ll explore a new method based on reinforcement learning for model fine-tuning, which can improve both the accuracy and efficiency of SQL generation.

TantorLabs Oct 24 at 09:38

OAuth 2.0 authorization in PostgreSQL using Keycloak as an example

Easy

27 min

9.8K

Тантор Лабс corporate blogPostgreSQL * Database Administration * System administration * Information Security *

Tutorial

Translation

Hello, Habr! We continue the series of articles on the innovations of the Tantor Postgres 17.5.0 DBMS, and today we will talk about authorization support via OAuth 2.0 Device Authorization Flow is a modern and secure access method that allows applications to request access to PostgreSQL on behalf of the user through an external identification and access control provider, such as Keycloak, which is especially convenient for cloud environments and microservice architectures (the feature will also be available in PostgreSQL 18). In this article, we'll take a step-by-step look at configuring OAuth authorization in PostgreSQL using Keycloak: configure Keycloak, prepare PostgreSQL, write an OAuth token validator in PostgreSQL, and verify successful authorization via psql using Device Flow.

melanny20 Oct 15 at 14:00

Shardman. A quick guide for the architect

22 min

15K

Postgres Professional corporate blogDatabase Administration * Server Administration * PostgreSQL *

Tutorial

Translation

The myth of the magical fast=true parameter is still alive and well, but in distributed databases, another contender appears: distributed=true. Neither one will save you if you don’t rethink your schema, sharding keys, sequences, queries, and migration process. We walk through every corner with a clear-eyed approach — from choosing sharding keys and colocated tables to CDC, topologies, and foreign key constraints — showing where performance really improves, where it gets more expensive, and how to deal with it.

melanny20 Oct 7 at 15:00

How to successfully migrate from Oracle to Postgres Pro Enterprise

Medium

8 min

25K

Postgres Professional corporate blogPostgreSQL * SQL * Database Administration * Server Administration *

Tutorial

Translation

Migration from Oracle to vanilla PostgreSQL hits roadblocks with packages, autonomous transactions, and collections—they simply don’t exist there. We’ll break down why ora2pg stumbles, how native implementations of these mechanisms in Postgres Pro Enterprise make life easier, and how ora2pgpro translates PL/SQL semantically correctly, without hacks or crude regex.

kaze_no_saga Sep 26 at 08:00

PostgreSQL 18: Part 5 or CommitFest 2025-03

Medium

34 min

18K

Postgres Professional corporate blogPostgreSQL * SQL * Database Administration *

Digest

Translation

September 25th marks the release of PostgreSQL 18. This article covers the March CommitFest and concludes the series covering the new features of the upcoming update. This article turned out quite large, as the last March CommitFest is traditionally the biggest and richest in new features.

You can find previous reviews of PostgreSQL 18 CommitFests here: 2024-07, 2024-09, 2024-11, 2025-01.

melanny20 Sep 15 at 13:29

Postgres Pro TDE — security and performance

Medium

14 min

18K

Postgres Professional corporate blogPostgreSQL * Server Administration * System administration * Database Administration *

Review

Translation

TDE comes in many flavors — from encryption at the TAM level to full-cluster encryption and tablespace markers. We take a close look at Percona, Cybertec/EDB, Pangolin/Fujitsu, and show where you lose performance and reliability, and where you gain flexibility.

On top of that, Vasily Bernstein, Deputy head of product development, and Vladimir Abramov, senior security engineer, will share how Postgres Pro Enterprise implements key rotation without rewriting entire tables — and why AES-GCM was the clear choice.

+11

melanny20 Aug 25 at 10:42

How we loaded a petabyte into PostgreSQL before New Year — and what happened next

Medium

17 min

13K

Postgres Professional corporate blogPostgreSQL * Database Administration *

Retrospective

Translation

It all started as a joke by the office coffee machine. But, as with every decent joke, it suddenly sounded worth trying — and before we knew it, we were knee-deep in an experiment that turned out to be anything but trivial, complete with a whole minefield of gotchas.

It began simply: while everyone else was busy debating hardware tuning and squeezing out extra TPS from their systems, we thought — why not just shove a huge chunk of data into PostgreSQL and see how it holds up? Like, really huge. Say, a one-petabyte database. Let’s see how it survives that.

It was December 10, the boss wanted the report by January 20, and New Year was less than a month away. And that itch that all engineers know? It hit hard.

TantorLabs Aug 22 at 05:06

How to load test PostgreSQL database and not miss anything

Medium

14 min

14K

Тантор Лабс corporate blogPostgreSQL * Database Administration * High performance * IT systems testing *

Review

During load testing of Tantor Postgres databases or other PostgreSQL-based databases using the standard tool pgbench, specialists often encounter non-representative results and the need for repeated tests due to the fact that details of the environment (such as DBMS configuration, server characteristics, PostgreSQL versions) are not recorded. In this article we are going to review author's pg_perfbench, which is designed to address this issue. It ensures that scenarios are repeatable, prevents the loss of important data, and streamlines result comparison by registering all parameters in a single template. It also automatically launches pgbench with TPC-B load generation, collects all metadata on the testing environment, and generates a structured report.

TantorLabs Jul 18 at 03:43

Redundant statistics slow down your Postgres? Try sampling in pg_stat_statements

Medium

11 min

3.8K

Тантор Лабс corporate blogPostgreSQL * SQL * Database Administration * System administration *

Tutorial

pg_stat_statements is the standard PostgreSQL extension used to track query statistics: number of executions, total and average execution time, number of returned rows, and other metrics. This information allows to analyze query behavior over time, identify problem areas, and make informed optimization decisions. However, in systems with high contention, pg_stat_statements itself can become a bottleneck and cause performance drops. In this article, we will analyze in which scenarios the extension becomes a source of problems, how sampling is structured, and in which cases its application can reduce overhead.

qqwrst Apr 4 2022 at 02:34

The journey of scaling up a production Elasticsearch cluster

6 min

4.1K

Distributed systems * Amazon Web Services * Database Administration * NoSQL * High performance *

In this article, I will tell you about a-few-years journey of scaling the Elasticsearch cluster in production environment, which is one of the vital elements of the iPrice technology stack.
I will describe challenges we encountered and how we approached them.

Yersin_DBA Oct 30 2021 at 17:04

Extending and moving a ZooKeeper ensemble

3 min

3.4K

Database Administration * Big Data *

Tutorial

Translation

Once upon a time our DBA team had a task. We had to move a ZooKeeper ensemble which we had been using for Clickhouse cluster. Everyone is used to moving an ensemble by moving its data files. It seems easy and obvious but our Clickhouse cluster had more than 400 TB replicated data. All replication information had been collected in ZooKeeper cluster from the very beginning. At the end of the day we couldn’t miss even a row of data. Then we looked for information on the internet. Unfortunately there was a good tutorial about 3.4.5 and didn’t fit our version 3.6.2. So we decided to use “the extending” for moving our ensemble.

michael-filonenko Sep 1 2021 at 13:15

In-Memory Showdown: Redis vs. Tarantool

13 min

6.6K

VK corporate blogTarantool * Database Administration * High performance *

In this article, I am going to look at Redis versus Tarantool. At a first glance, they are quite alike — in-memory, NoSQL, key value. But we are going to look deeper. My goal is to find meaningful similarities and differences, I am not going to claim that one is better than the other.

There are three main parts to my story:

We’ll find out what is an in-memory database, or IMDB. When and how are they better than disk solutions?
Then, we’ll consider their architecture. What about their efficiency, reliability, and scaling?
Then, we’ll delve into technical details. Data types, iterators, indexes, transactions, programming languages, replication, and connectors.

Feel free to scroll down to the most interesting part or even the summary comparison table at the very bottom and the article.

+16

Dradmin Apr 30 2021 at 05:33

MySQL 8 Performance Benchmark

3 min

11K

MySQL * IT Infrastructure * *nix * Server Administration * Database Administration *

In this article, we benchmark the performance of MySQL 8 default configuration vs. innodb_dedicated_server enabled configuration vs. the configuration recommended by MySQL Performance Tuning Service.

KAPANDR Dec 29 2020 at 16:59

Tarantool: an analyst's view

8 min

2.2K

VK corporate blogTarantool * Database Administration * System Analysis and Design * Internet marketing *

Hi all! I'm Andrey Kapustin. I work as a system analyst at Mail.ru Group. Our products form a unified ecosystem. Many independent infrastructures generate data in it: taxi and food delivery services, email services, social networks, etc. The faster and more precise we can predict a client's needs, the sooner and more correctly we can offer our products.

Many system analysts and engineers are keen to know:

How to design the architecture of a trigger platform for real-time marketing?
How to arrange a data structure that would be in line with the requirements of a marketing strategy for interacting with clients?
How to ensure the stable operations of the system under very heavy workloads?

Such systems are based on technologies of high-load processing and Big Data analysis. We have accumulated considerable experience in these areas. Our expertise is in high demand on the market. I'm going to show how we help our customers to switch from off-line to on-line in their interactions with clients using Real-Time Marketing solutions based on Tarantool.

+26

Wendigoo Oct 5 2020 at 07:38

Mysql 8.x Group Replication (Master-Slave) with Docker Compose

5 min

6.7K

MySQL * Database Administration * DevOps *

This post is handling the following situation - how to setup up simple Mysql services with group replication being dockerized. In our case, we’ll take the latest Mysql (version 8.x.x)

FYI: all mentioned code (worked and tested manually) located here.

I will skip not interested steps like ‘what is Mysql, Docker and why we choose them, etc’. We want to set up possibly trouble proof DB. That’s our plan.

AnnaPhc Aug 11 2020 at 16:05

IIoT platform databases – How Mail.ru Cloud Solutions deals with petabytes of data coming from a multitude of devices

11 min

2.1K

VK corporate blogData storage * IOTDatabase Administration * Tarantool *

Hello, my name is Andrey Sergeyev and I work as a Head of IoT Solution Development at Mail.ru Cloud Solutions. We all know there is no such thing as a universal database. Especially when the task is to build an IoT platform that would be capable of processing millions of events from various sensors in near real-time.

Our product Mail.ru IoT Platform started as a Tarantool-based prototype. I’m going to tell you about our journey, the problems we faced and the solutions we found. I will also show you a current architecture for the modern Industrial Internet of Things platform. In this article we will look into:

our requirements for the database, universal solutions, and the CAP theorem
whether the database + application server in one approach is a silver bullet
the evolution of the platform and the databases used in it
the number of Tarantools we use and how we came to this

+19

olku Aug 3 2020 at 15:35

Lossless ElasticSearch data migration

5 min

4.6K

DevOps * NoSQL * Database Administration *

Translation

Academic data warehouse design recommends keeping everything in a normalized form, with links between. Then the roll forward of changes in relational math will provide a reliable repository with transaction support. Atomicity, Consistency, Isolation, Durability — that's all. In other words, the storage is explicitly built to safely update the data. But it is not optimal for searching, especially with a broad gesture on the tables and fields. We need indices, a lot of indices! Volumes expand, recording slows down. SQL LIKE can not be indexed, and JOIN GROUP BY sends us to meditate in the query planner.