Search
Write a publication
Pull to refresh
82.62

SQL *

Domain-specific languageused in programming and designed for managing data held in a relational database management system, or for stream processing in a relational data stream management system

Show first
Period
Level of difficulty

MVCC in PostgreSQL-1. Isolation

Reading time24 min
Views12K
Hello, Habr! With this article I start a set of series (or a series of sets? — In a word, the idea is grandiose) about the internal structure of PostgreSQL.

The material will be based on training courses (in Russian) on administration that Pavel pluzanov and I are creating. Not everyone likes to watch video (I definitely do not), and reading slides, even with comments, is no good at all.

Unfortunately, the only course available in English at the moment is 2-Day Introduction to PostgreSQL 11.

Of course, the articles will not be exactly the same as the content of the courses. I will talk only about how everything is organized, omitting the administration itself, but I will try to do it in more detail and more thoroughly. And I believe that the knowledge like this is as useful to an application developer as it is to an administrator.

I will target those who already have some experience in using PostgreSQL and at least in general understand what is what. The text will be too difficult for beginners. For example, I will not say a word about how to install PostgreSQL and run psql.

The stuff in question does not vary much from version to version, but I will use the current, 11th vanilla PostgreSQL.

The first series deals with issues related to isolation and multiversion concurrency, and the plan of the series is as follows:

  1. Isolation as understood by the standard and PostgreSQL (this article).
  2. Forks, files, pages — what is happening at the physical level.
  3. Row versions, virtual transactions and subtransactions.
  4. Data snapshots and the visibility of row versions; the event horizon.
  5. In-page vacuum and HOT updates.
  6. Normal vacuum.
  7. Autovacuum.
  8. Transaction id wraparound and freezing.

Off we go!

And before we start, I would like to thank Elena Indrupskaya for translating the articles to English.

Read more →

Testing SQL Server code with tSQLt

Reading time20 min
Views2.5K
FYI: this article is an expanded version of my talk at SQA Days #25.

Based on my experience with colleagues, I can state: DB code testing is not a widely spread practice. This can be potentially dangerous. DB logic is written by human beings just like all other «usual» code. So, there can be failures which can cause negative consequences for a product, business or users. Whether these are stored procedures helping backend or it is ETL modifying data in a warehouse — there is always a risk and testing helps to decrease it. I want to tell you what tSQLt is and how it helps us to test DB code.

Read more →

Indexes in PostgreSQL — 4 (Btree)

Reading time17 min
Views22K
We've already discussed PostgreSQL indexing engine and interface of access methods, as well as hash index, one of access methods. We will now consider B-tree, the most traditional and widely used index. This article is large, so be patient.

Btree


Structure


B-tree index type, implemented as «btree» access method, is suitable for data that can be sorted. In other words, «greater», «greater or equal», «less», «less or equal», and «equal» operators must be defined for the data type. Note that the same data can sometimes be sorted differently, which takes us back to the concept of operator family.
Read more →

PostgreSQL 18: Part 1 or CommitFest 2024-07

Level of difficultyMedium
Reading time10 min
Views1.3K
Read more →

Queries in PostgreSQL. Nested Loop

Reading time17 min
Views3K

So far we've discussed query execution stagesstatistics, and the two basic data access methods: Sequential scan and Index scan.

The next item on the list is join methods. This article will remind you what logical join types are out there, and then discuss one of three physical join methods, the Nested loop join. Additionally, we will check out the row memoization feature introduced in PostgreSQL 14.

Read more

Locks in PostgreSQL: 2. Row-level locks

Reading time14 min
Views14K
Last time, we discussed object-level locks and in particular relation-level locks. In this article, we will see how row-level locks are organized in PostgreSQL and how they are used together with object-level locks. We will also talk of wait queues and of those who jumps the queue.



Row-level locks


Organization


Let's recall a few weighty conclusions of the previous article.

  • A lock must be available somewhere in the shared memory of the server.
  • The higher granularity of locks, the lower the contention among concurrent processes.
  • On the other hand, the higher the granularity, the more of the memory is occupied by locks.

There is no doubt that we want a change of one row not block other rows of the same table. But we cannot afford to have its own lock for each row either.

There are different approaches to solving this problem. Some database management systems apply escalation of locks: if the number of row-level locks gets too high, they are replaced with one, more general lock (for example: a page-level or an entire table-level).

As we will see later, PostgreSQL also applies this technique, but only for predicate locks. The situation with row-level locks is different.
Read more →

MVCC in PostgreSQL-4. Snapshots

Reading time9 min
Views6.7K
After having discussed isolation problems and having made a digression regarding the low-level data structure, last time we explored row versions and observed how different operations changed tuple header fields.

Now we will look at how consistent data snapshots are obtained from tuples.

What is a data snapshot?


Data pages can physically contain several versions of the same row. But each transaction must see only one (or none) version of each row, so that all of them make up a consistent picture of the data (in the sense of ACID) as of a certain point in time.

Isolation in PosgreSQL is based on snapshots: each transaction works with its own data snapshot, which «contains» data that were committed before the moment the snapshot was created and does not «contain» data that were not committed by that moment yet. We've already seen that although the resulting isolation appears stricter than required by the standard, it still has anomalies.
Read more →

How I prepared for and passed the Oracle Database SQL Certification (1Z0-071)

Reading time9 min
Views22K

Why did I write this article?


image

When I was preparing for Java 8 OCA and OCP I found a lot of useful articles about subjects on Habr that helped me to choose the optimal path and save a considerable amount of time.


When I started preparing for OCA Oracle Database SQL (1Z0-071) I didn’t find any materials on Habr about this matter and found there to be limited information available on the Internet. Because of this I decided to write a complete guide in order to help others who are interested in this certification to help them save time and successfully pass what I consider to be a pretty hard exam.

Read more →

How can a manual tester understand an automation tester, and vice versa?

Level of difficultyEasy
Reading time3 min
Views1.1K

When we go abroad for vacation or meet a foreigner on the street who doesn’t speak Russian but is trying to ask, “Where is the restroom? How do I get to…”, we wonder how to explain things to them in Russian in a way they would understand.

I asked myself a similar question when trying to explain something to a colleague using SQL while they were working with Java. The main goal of my work was to create a quality test model. Without it, there would be no proper regression testing later on.

I started by building a framework filling it with test cases. We held a meeting where we discussed priority of positive and negative test cases briefly. When developing the test scenarios, I used the incremental model, but as practice showed, this approach also required an iterative method. For example, it is like having the outline of the Mona Lisa first, then adding colors, painting the background, and so on.

It’s better to maintain the checklist in Excel format to add columns, write notes, and more. And let’s not forget that, as we take on the role of Leonardo da Vinci, we use different colors and get creative.

I am a manager by profession specializating in Production Management. My motivator is the Theory of Constraints (TOC) methodology, which focuses on identifying and managing the key constraint of a system to determine the efficiency of the entire system as a whole:

Read more

Evaluating Performance: CosmosDB vs. Azure SQL

Level of difficultyEasy
Reading time4 min
Views7.3K

In the evolving landscape of database technology, choosing the right database management system is crucial for the efficiency and scalability of applications. This article presents a detailed comparison of the performance between Microsoft's CosmosDB and MS SQL Server. We'll examine how each database performs under various load conditions and share some interesting findings.

Read more

PostgreSQL 17: Part 3 or Commitfest 2023-11

Level of difficultyMedium
Reading time11 min
Views924


The November commitfest is ripe with new interesting features! Without further ado, let's proceed with the review.


If you missed our July and September commitfest reviews, you can check them out here: 2023-07, 2023-09.


ON LOGIN trigger
Event triggers for REINDEX
ALTER OPERATOR: commutator, negator, hashes, merges
pg_dump --filter=dump.txt
psql: displaying default privileges
pg_stat_statements: track statement entry timestamps and reset min/max statistics
pg_stat_checkpointer: checkpointer process statistics
pg_stats: statistics for range type columns
Planner: exclusion of unnecessary table self-joins
Planner: materialized CTE statistics
Planner: accessing a table with multiple clauses
Index range scan optimization
dblink, postgres_fdw: detailed wait events
Logical replication: migration of replication slots during publisher upgrade
Replication slot use log
Unicode: new information functions
New function: xmltext
AT LOCAL support
Infinite intervals
ALTER SYSTEM with unrecognized custom parameters
Building the server from source

Read more →

PostgreSQL 17: Part 2 or Commitfest 2023-09

Reading time11 min
Views1.2K


We continue to follow the news of the PostgreSQL 17 development. Let's find out what the September commitfest brings to the table.


If you missed our July commitfest review, you can check it out here: 2023-07.


Removed the parameter old_snapshot_threshold
New parameter event_triggers
New functions to_bin and to_oct
New system view pg_wait_events
EXPLAIN: a JIT compilation time counter for tuple deforming
Planner: better estimate of the initial cost of the WindowAgg node
pg_constraint: NOT NULL constraints
Normalization of CALL, DEALLOCATE and two-phase commit control commands
unaccent: the target rule expressions now support values in quotation marks
COPY FROM: FORCE_NOT_NULL * and FORCE_NULL *
Audit of connections without authentication
pg_stat_subscription: new column worker_type
The behaviour of pg_promote in case of unsuccessful switchover to a replica
Choosing the disk synchronization method in server utilities
pg_restore: optimization of parallel recovery of a large number of tables
pg_basebackup and pg_receivewal with the parameter dbname
Parameter names for a number of built-in functions
psql: \watch min_rows

Read more →

MSSQL: Table Rebuild and Reorg in highload 24/7 Environments

Level of difficultyMedium
Reading time14 min
Views3.4K

How do you deal with index fragmentation if your SQL server is working in high load environment with 24/7 workload without any maintenance window? What are the best practices for index rebuild and index reorganize? What is better? What is possible if you have only Standard Edition on some servers? But first, let's debunk few myths.

Myth 1. We use SSD (or super duper storage), so we should not care about the fragmentation. False. Index rebuild compactifies a table, with compression it makes it sometimes several times smaller, improving the cache hits ratio and overall performance (this happens even without compression).

Myth 2. Index rebuild shorten SSD lifespan. False. One extra write cycle is nothing for the modern SSDs. If your tempdb is on SSD/NVMe, it is under much harder stress than data disks.

Myth 3. On Enterprise Edition there is a good option: ONLINE=ON, so I just create a script with all tables and go ahead. False. There are tons of potential problems created by INDEX REBUILD even with ONLINE and RESUMABLE ON - so never run index rebuilds without controlling the process.

Finally, we will tackle the REBUILD vs REORGANIZE subject and what is possible to achieve if you have only Standard Edition.

Read more

Queries in PostgreSQL. Sort and merge

Reading time19 min
Views2.2K


In the previous articles, we have covered query execution stages, statistics, sequential and index scan, and two of the three join methods: nested loop and hash join.


This last article of the series will cover the merge algorithm and sorting. I will also demonstrate how the three join methods compare against each other.

Read more →

Authors' contribution