SQL *

Domain-specific languageused in programming and designed for managing data held in a relational database management system, or for stream processing in a relational data stream management system

ArticlesPostsNewsAuthors

erogov Sep 16 2019 at 09:17

MVCC in PostgreSQL-1. Isolation

24 min

12K

Postgres Professional corporate blogPostgreSQL*SQL*

Translation

Hello, Habr! With this article I start a set of series (or a series of sets? — In a word, the idea is grandiose) about the internal structure of PostgreSQL.

The material will be based on training courses (in Russian) on administration that Pavel pluzanov and I are creating. Not everyone likes to watch video (I definitely do not), and reading slides, even with comments, is no good at all.

Unfortunately, the only course available in English at the moment is 2-Day Introduction to PostgreSQL 11.

Of course, the articles will not be exactly the same as the content of the courses. I will talk only about how everything is organized, omitting the administration itself, but I will try to do it in more detail and more thoroughly. And I believe that the knowledge like this is as useful to an application developer as it is to an administrator.

I will target those who already have some experience in using PostgreSQL and at least in general understand what is what. The text will be too difficult for beginners. For example, I will not say a word about how to install PostgreSQL and run psql.

The stuff in question does not vary much from version to version, but I will use the current, 11th vanilla PostgreSQL.

The first series deals with issues related to isolation and multiversion concurrency, and the plan of the series is as follows:

Isolation as understood by the standard and PostgreSQL (this article).
Forks, files, pages — what is happening at the physical level.
Row versions, virtual transactions and subtransactions.
Data snapshots and the visibility of row versions; the event horizon.
In-page vacuum and HOT updates.
Normal vacuum.
Autovacuum.
Transaction id wraparound and freezing.

Off we go!

And before we start, I would like to thank Elena Indrupskaya for translating the articles to English.

captainkoffski Sep 3 2019 at 09:21

Testing SQL Server code with tSQLt

20 min

2.5K

Arcadia corporate blogIT systems testing*SQL*Microsoft SQL Server*Web services testing*

FYI: this article is an expanded version of my talk at SQA Days #25.

Based on my experience with colleagues, I can state: DB code testing is not a widely spread practice. This can be potentially dangerous. DB logic is written by human beings just like all other «usual» code. So, there can be failures which can cause negative consequences for a product, business or users. Whether these are stored procedures helping backend or it is ETL modifying data in a warehouse — there is always a risk and testing helps to decrease it. I want to tell you what tSQLt is and how it helps us to test DB code.

Tzimie Mar 21 at 08:59

Building Flame Diagram for MSSQL stored procedures

Medium

3 min

367

SQL*Microsoft SQL Server*

Tutorial

If your code has many nested executions of stored procedures, you can benefit from building popular "flame diagram" of the execution time which is de facto standard for performance profiling.

kaze_no_saga Aug 17 2022 at 06:51

Queries in PostgreSQL. Hashing

18 min

3.5K

Postgres Professional corporate blogPostgreSQL*SQL*

Translation

Queries in PostgreSQL. Hashing

So far we have covered query execution stages, statistics, sequential and index scan, and have moved on to joins.

erogov Mar 25 2019 at 08:18

Indexes in PostgreSQL — 4 (Btree)

17 min

22K

Postgres Professional corporate blogSQL*PostgreSQL*

Translation

We've already discussed PostgreSQL indexing engine and interface of access methods, as well as hash index, one of access methods. We will now consider B-tree, the most traditional and widely used index. This article is large, so be patient.

Btree

Structure

B-tree index type, implemented as «btree» access method, is suitable for data that can be sorted. In other words, «greater», «greater or equal», «less», «less or equal», and «equal» operators must be defined for the data type. Note that the same data can sometimes be sorted differently, which takes us back to the concept of operator family.

kaze_no_saga Nov 22 2024 at 08:04

PostgreSQL 18: Part 1 or CommitFest 2024-07

Medium

10 min

1.3K

Postgres Professional corporate blogPostgreSQL*SQL*

Translation

This article is the first in the series about the upcoming PostgreSQL 18 release. Let us take a look at the features introduced in the July CommitFest.

Planner: Hash Right Semi Join support
Planner: materializing an internal row set for parallel nested loop join
Planner support functions for generate_series
EXPLAIN (analyze): statistics for Parallel Bitmap Heap Scan node workers
Functions min and max for composite types
Parameter names for regexp* functions
Debug mode in pgbench
pg_get_backend_memory_contexts: column path instead of parent, new column type
Function pg_get_acl
pg_upgrade: pg_dump optimization
Predefined role pg_signal_autovacuum_worker

kaze_no_saga Nov 8 2022 at 12:14

Queries in PostgreSQL. Nested Loop

17 min

Postgres Professional corporate blogSQL*PostgreSQL*

Translation

So far we've discussed query execution stages, statistics, and the two basic data access methods: Sequential scan and Index scan.

The next item on the list is join methods. This article will remind you what logical join types are out there, and then discuss one of three physical join methods, the Nested loop join. Additionally, we will check out the row memoization feature introduced in PostgreSQL 14.

kaze_no_saga May 24 2022 at 05:11

Queries in PostgreSQL. Index scan

18 min

5.4K

Postgres Professional corporate blogPostgreSQL*SQL*

Translation

Queries in PostgreSQL. Index scan

In previous articles we discussed query execution stages and statistics. Last time, I started on data access methods, namely Sequential scan. Today we will cover Index Scan.

kaze_no_saga Feb 3 2022 at 10:48

Queries in PostgreSQL. Query execution stages

15 min

5.6K

Postgres Professional corporate blogSQL*PostgreSQL*

Translation

Hello! I'm kicking off another article series about the internals of PostgreSQL. This one will focus on query planning and execution mechanics.

In the first article we will split the query execution process into stages and discuss what exactly happens at each stage.

erogov May 20 2020 at 19:38

Locks in PostgreSQL: 2. Row-level locks

14 min

14K

Postgres Professional corporate blogPostgreSQL*SQL*

Translation

Last time, we discussed object-level locks and in particular relation-level locks. In this article, we will see how row-level locks are organized in PostgreSQL and how they are used together with object-level locks. We will also talk of wait queues and of those who jumps the queue.

Row-level locks

Organization

Let's recall a few weighty conclusions of the previous article.

A lock must be available somewhere in the shared memory of the server.
The higher granularity of locks, the lower the contention among concurrent processes.
On the other hand, the higher the granularity, the more of the memory is occupied by locks.

There is no doubt that we want a change of one row not block other rows of the same table. But we cannot afford to have its own lock for each row either.

There are different approaches to solving this problem. Some database management systems apply escalation of locks: if the number of row-level locks gets too high, they are replaced with one, more general lock (for example: a page-level or an entire table-level).

As we will see later, PostgreSQL also applies this technique, but only for predicate locks. The situation with row-level locks is different.

erogov Dec 10 2019 at 11:57

MVCC in PostgreSQL-4. Snapshots

9 min

6.7K

Postgres Professional corporate blogPostgreSQL*SQL*

Translation

After having discussed isolation problems and having made a digression regarding the low-level data structure, last time we explored row versions and observed how different operations changed tuple header fields.

Now we will look at how consistent data snapshots are obtained from tuples.

What is a data snapshot?

Data pages can physically contain several versions of the same row. But each transaction must see only one (or none) version of each row, so that all of them make up a consistent picture of the data (in the sense of ACID) as of a certain point in time.

Isolation in PosgreSQL is based on snapshots: each transaction works with its own data snapshot, which «contains» data that were committed before the moment the snapshot was created and does not «contain» data that were not committed by that moment yet. We've already seen that although the resulting isolation appears stricter than required by the standard, it still has anomalies.

raiym Sep 12 2019 at 04:33

How I prepared for and passed the Oracle Database SQL Certification (1Z0-071)

9 min

22K

Oracle*SQL*

Why did I write this article?

When I was preparing for Java 8 OCA and OCP I found a lot of useful articles about subjects on Habr that helped me to choose the optimal path and save a considerable amount of time.

When I started preparing for OCA Oracle Database SQL (1Z0-071) I didn’t find any materials on Habr about this matter and found there to be limited information available on the Internet. Because of this I decided to write a complete guide in order to help others who are interested in this certification to help them save time and successfully pass what I consider to be a pretty hard exam.

Tzimie May 20 at 08:33

MSSQL: Index Heatmaps in TreeView Format

2 min

694

SQL*Microsoft SQL Server*Python*

Tutorial

Would you like to know which indexes are used frequently or rarely? Which ones aren't used at all? Which tables and indexes are the largest? It's very easy to create visualizations for this. They're both visually appealing and practically useful.

janna_melnikova Nov 25 2024 at 07:36

How can a manual tester understand an automation tester, and vice versa?

Easy

3 min

1.1K

Java*SQL*API*

Tutorial

When we go abroad for vacation or meet a foreigner on the street who doesn’t speak Russian but is trying to ask, “Where is the restroom? How do I get to…”, we wonder how to explain things to them in Russian in a way they would understand.

I asked myself a similar question when trying to explain something to a colleague using SQL while they were working with Java. The main goal of my work was to create a quality test model. Without it, there would be no proper regression testing later on.

I started by building a framework filling it with test cases. We held a meeting where we discussed priority of positive and negative test cases briefly. When developing the test scenarios, I used the incremental model, but as practice showed, this approach also required an iterative method. For example, it is like having the outline of the Mona Lisa first, then adding colors, painting the background, and so on.

It’s better to maintain the checklist in Excel format to add columns, write notes, and more. And let’s not forget that, as we take on the role of Leonardo da Vinci, we use different colors and get creative.

I am a manager by profession specializating in Production Management. My motivator is the Theory of Constraints (TOC) methodology, which focuses on identifying and managing the key constraint of a system to determine the efficiency of the entire system as a whole:

kaze_no_saga May 29 2024 at 06:47

PostgreSQL 17: Part 4 or Commitfest 2024-01

11 min

735

Postgres Professional corporate blogPostgreSQL*SQL*

Translation

Spring is in full swing as we bring you the hottest winter news of the January Commitfest. Let's get to the good stuff right away!

Previous articles about PostgreSQL 17: 2023-07, 2023-09, 2023-11.

rumatavz Jan 16 2024 at 19:42

Evaluating Performance: CosmosDB vs. Azure SQL

Easy

4 min

7.3K

SQL*NoSQL*Microsoft SQL Server*Microsoft Azure*Data Engineering*

Analytics

In the evolving landscape of database technology, choosing the right database management system is crucial for the efficiency and scalability of applications. This article presents a detailed comparison of the performance between Microsoft's CosmosDB and MS SQL Server. We'll examine how each database performs under various load conditions and share some interesting findings.

kaze_no_saga Jan 16 2024 at 03:29

PostgreSQL 17: Part 3 or Commitfest 2023-11

Medium

11 min

924

Postgres Professional corporate blogPostgreSQL*SQL*

Digest

Translation

The November commitfest is ripe with new interesting features! Without further ado, let's proceed with the review.

If you missed our July and September commitfest reviews, you can check them out here: 2023-07, 2023-09.

ON LOGIN trigger
Event triggers for REINDEX
ALTER OPERATOR: commutator, negator, hashes, merges
pg_dump --filter=dump.txt
psql: displaying default privileges
pg_stat_statements: track statement entry timestamps and reset min/max statistics
pg_stat_checkpointer: checkpointer process statistics
pg_stats: statistics for range type columns
Planner: exclusion of unnecessary table self-joins
Planner: materialized CTE statistics
Planner: accessing a table with multiple clauses
Index range scan optimization
dblink, postgres_fdw: detailed wait events
Logical replication: migration of replication slots during publisher upgrade
Replication slot use log
Unicode: new information functions
New function: xmltext
AT LOCAL support
Infinite intervals
ALTER SYSTEM with unrecognized custom parameters
Building the server from source

kaze_no_saga Nov 7 2023 at 03:03

PostgreSQL 17: Part 2 or Commitfest 2023-09

11 min

1.2K

Postgres Professional corporate blogSQL*PostgreSQL*

Digest

Translation

We continue to follow the news of the PostgreSQL 17 development. Let's find out what the September commitfest brings to the table.

If you missed our July commitfest review, you can check it out here: 2023-07.

Removed the parameter old_snapshot_threshold
New parameter event_triggers
New functions to_bin and to_oct
New system view pg_wait_events
EXPLAIN: a JIT compilation time counter for tuple deforming
Planner: better estimate of the initial cost of the WindowAgg node
pg_constraint: NOT NULL constraints
Normalization of CALL, DEALLOCATE and two-phase commit control commands
unaccent: the target rule expressions now support values in quotation marks
COPY FROM: FORCE_NOT_NULL * and FORCE_NULL *
Audit of connections without authentication
pg_stat_subscription: new column worker_type
The behaviour of pg_promote in case of unsuccessful switchover to a replica
Choosing the disk synchronization method in server utilities
pg_restore: optimization of parallel recovery of a large number of tables
pg_basebackup and pg_receivewal with the parameter dbname
Parameter names for a number of built-in functions
psql: \watch min_rows

Tzimie Sep 17 2023 at 15:51

MSSQL: Table Rebuild and Reorg in highload 24/7 Environments

Medium

14 min

3.4K

SQL*PowerShell*Server optimization*Microsoft SQL Server*

Tutorial

How do you deal with index fragmentation if your SQL server is working in high load environment with 24/7 workload without any maintenance window? What are the best practices for index rebuild and index reorganize? What is better? What is possible if you have only Standard Edition on some servers? But first, let's debunk few myths.

Myth 1. We use SSD (or super duper storage), so we should not care about the fragmentation. False. Index rebuild compactifies a table, with compression it makes it sometimes several times smaller, improving the cache hits ratio and overall performance (this happens even without compression).

Myth 2. Index rebuild shorten SSD lifespan. False. One extra write cycle is nothing for the modern SSDs. If your tempdb is on SSD/NVMe, it is under much harder stress than data disks.

Myth 3. On Enterprise Edition there is a good option: ONLINE=ON, so I just create a script with all tables and go ahead. False. There are tons of potential problems created by INDEX REBUILD even with ONLINE and RESUMABLE ON - so never run index rebuilds without controlling the process.

Finally, we will tackle the REBUILD vs REORGANIZE subject and what is possible to achieve if you have only Standard Edition.

kaze_no_saga Nov 1 2022 at 06:35

Queries in PostgreSQL. Sort and merge

19 min

2.2K

Postgres Professional corporate blogPostgreSQL*SQL*

Translation

In the previous articles, we have covered query execution stages, statistics, sequential and index scan, and two of the three join methods: nested loop and hash join.

This last article of the series will cover the merge algorithm and sorting. I will also demonstrate how the three join methods compare against each other.