kaze_no_saga 1 ноя 2022 в 12:32

PostgreSQL 16: Part 1 or CommitFest 2022-07

7 мин

1.1K

Блог компании Postgres ProfessionalSQL*PostgreSQL*

Перевод

August was a special month in PostgreSQL release cycle as the first CommitFest for the 16th PostgreSQL release was held.

Let's compile the server and check out the cool new stuff!

\dconfig server_version

List of configuration parameters
   Parameter    |  Value
----------------+---------
 server_version | 16devel

Notable features

There are 14 patches I want to talk about:

psql: \pset xheader_width
vacuumdb --schema and --exclude-schema options
New createuser features added
Checkpoint and redo LSN added to LogCheckpointEnd log message
pg_prepared_statements.result_types
New parameter log_parameter_max_length for auto_explain
Subquery alias in FROM clause became optional
REINDEX: syntax improvements and more
CREATE STATISTICS: statistic name is now optional
CREATE TABLE: the STORAGE attribute
The user created during cluster initialization now can't be stripped of superuser privileges
TRUNCATE triggers on foreign tables are now supported
New argument variation for pg_read_file/pg_read_binary_file
Extensible WAL resource managers

psql: \pset xheader_width
commit: a45388d6

Expanded output mode (\x or \pset expanded) is a convenient option whenever your column data won't fit into the terminal screen. However, even the expanded mode may not be enough for some content.

Let's run a query in a 72-character-wide terminal:

\pset expanded on
\pset pager off
SELECT version(),
length(version()) version_length;

-[ RECORD 1 ]--+--------------------------------------------------------
version        | PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
version_length | 104

The version value doesn't fit and gets wrapped. That's completely fine. But the output header is generated in the same size as the widest column and gets wrapped as well. The wider the widest column, the longer the header will be. This is especially annoying when displaying multiple wide records, because each header occupies multiple lines in the output. (Try running SELECT * FROM pg_proc; in the expanded mode.)

PostgreSQL 16 introduces a new record formatting option to handle these long headers:

\pset xheader_width

Expanded header width is 'full'.

The default option (full) is the familiar behavior from the past versions. Other options include:

column ― the header is trimmed to the width of the first output column.
page ― the header is trimmed to the width of the terminal window.
nnn ― the header is trimmed to the nnn-th character.

\pset xheader_width column

Expanded header width is 'column'.

SELECT version(),
length(version()) version_length;

-[ RECORD 1 ]--+
version        | PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
version_length | 104

The header no longer wastes valuable terminal space.

vacuumdb --schema and --exclude-schema
commit: 7781f4e3

vacuumdb gets new parameters for vacuuming specified schemas (--schema) and tables outside of specified schemas (--exclude-schema).

New createuser features added
commit: 08951a7c

createuser is a wrapper for the CREATE ROLE command, but with limited functionality. The patch adds new parameters to make use of previously unavailable functions, namely:

--valid-until for the VALID UNTIL clause
--bypassrls/--no-bypassrls for the BYPASSRLS/NOBYPASSRLS attributes
-m/--member to grant membership to the created role
-a/--admin to grant membership to the created role using the WITH ADMIN OPTION clause

Checkpoint and redo LSN added to LogCheckpointEnd log message
commit: 62c46eee

After a checkpoint is made, WAL takes a record with the LSN of the checkpoint and the LSN of the recovery point (redo LSN).

The two LSNs for the latest checkpoint are stored in the control file:

$ pg_controldata | egrep 'Latest.*location'

Latest checkpoint location:         1/1A3DEB20
Latest checkpoint's REDO location:  1/1A3DEAE8

In PostgreSQL 16, these records are now mentioned in the checkpoint completion message in the server log:

2022-08-02 12:15:17.961 MSK [198868] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.072 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=418462 kB; lsn=1/1A3DEB20, redo lsn=1/1A3DEAE8

Note that in PostgreSQL 15 and higher, the log_checkpoints parameter is on by default.

pg_prepared_statements.result_types
commit: 84ad713c, 6ffff0fd

The pg_prepared_statements view now has a column result_types.

PREPARE get_booking (text) AS
SELECT * FROM bookings WHERE book_ref = $1;
SELECT * FROM pg_prepared_statements WHERE name = 'get_booking'\gx

-[ RECORD 1 ]---+-----------------------------------------------
name            | get_booking
statement       | PREPARE get_booking (text) AS                 +
| SELECT * FROM bookings WHERE book_ref = $1;
prepare_time    | 2022-08-02 16:54:14.313221+03
parameter_types | {text}
result_types    | {character,"timestamp with time zone",numeric}
from_sql        | t
generic_plans   | 0
custom_plans    | 0

For prepared statements that return nothing the column will remain empty.

New parameter log_parameter_max_length for auto_explain
commit: d4bfe412

The new parameter auto_explain.log_parameter_max_length serves to log query parameters, just like log_parameter_max_length.

LOAD 'auto_explain';
SHOW auto_explain.log_parameter_max_length;

auto_explain.log_parameter_max_length

1
(1 row)

The default value is -1. This means that it will display parameter values in full. If set to 0, no parameter values will be displayed. A positive value limits the output length in bytes.

Let's set up logging and run a parameterized query EXECUTE... USING within an anonymous PL/pgSQL block.

SET auto_explain.log_min_duration = 0;
SET auto_explain.log_nested_statements = 'on';
SET auto_explain.log_level = 'NOTICE';
DO $$BEGIN EXECUTE 'SELECT $;

NOTICE:  duration: 0.003 ms  plan:
Query Text: SELECT $1
Query Parameters: $1 = '42'
Result  (cost=0.00..0.01 rows=1 width=4)
DO

Note the newly added Query Parameters string listing the parameters the query from Query Text was run wit.

Subquery alias in FROM clause became optional
commit: bcedd8f5

The SQL standard demands that any subquery within a FROM clause has an alias:

15=# SELECT * FROM (SELECT 42 AS a);

ERROR:  subquery in FROM must have an alias
LINE 1: SELECT * FROM (SELECT 42 AS a);
^
HINT:  For example, FROM (SELECT ...) [AS] foo.

This complicates migration from other DBMS that don't strongly demand an alias here.

PostgreSQL 16 makes these aliases optional.

16=# SELECT * FROM (SELECT 42 AS a);

a


42
(1 row)

REINDEX: syntax improvements and more
commit: 2cbc3c17

REINDEX DATABASE and REINDEX SYSTEM variations demand that a database is specified. The only valid option, however, is the current database, which makes the option redundant. This patch makes specifying the database optional.

Additionally, the REINDEX DATABASE command now reindexes only user-generated indexes and skips system catalog table indexes. You can still rebuild all database indexes by running both REINDEX DATABASE and REINDEX SYSTEM.

This patch addresses the demand for an ability to reindex only user-generated indexes across the whole database. Plus, rebuilding system indexes is straight up impossible in CONCURRENTLY mode and may result in deadlocks in high load environments.

CREATE STATISTICS: statistic name is now optional
commit: 624aa2a1

Extended statistics are separate database objects and therefore need names. But why not let PostgreSQL handle the naming without the user's involvement? We already do that with indexes and integrity constraints.

Done and done:

CREATE STATISTICS ON departure_airport, arrival_airport FROM flights;
\d flights

...
Statistics objects:
"bookings.flights_departure_airport_arrival_airport_stat" ON departure_airport, arrival_airport FROM flights

CREATE TABLE: the STORAGE attribute
commit: 784cedda

Suppose we want to store images in our database. Let's create a table:

CREATE TABLE images (filename text, data bytea);
ALTER TABLE images ALTER COLUMN data SET STORAGE external;

In the second command, we change the TOAST storage strategy to external to prohibit data compression. After all, the image files are already compressed.

The question is, why do this as a separate command, and not immediately specify the desired strategy as a part of the CREATE TABLE command? The answer is simple. Before PostgreSQL 16, CREATE TABLE didn't not allow specifying the STORAGE attribute for columns. Now it does:

CREATE TABLE images (filename text, data bytea STORAGE external);

The user created during cluster initialization now can't be stripped of superuser privileges
commit: e530be2c

Prior to PostgreSQL 16, the superuser created during cluster initialization (pg_authid.oid=10) could have taken the SUPERUSER attribute from himself:

15=# ALTER ROLE postgres NOSUPERUSER;

(If there were no other superusers, you had to restart the server in the single-user mode to restore the attribute.)

PostgreSQL 16 makes it impossible for the superuser to lose the attribute on accident:

16=# ALTER ROLE postgres NOSUPERUSER;

ERROR:  permission denied: bootstrap user must be superuser

By the way, it was impossible to delete the superuser created during cluster initialization even in previous versions:

CREATE ROLE root LOGIN SUPERUSER;
\c - root
DROP ROLE postgres;

ERROR:  cannot drop role postgres because it is required by the database system

In a nutshell, you need the initial superuser for the server to function properly and you shouldn't delete it or strip its superuser privileges. For extra security, you might consider making it a group role and stripping the LOGIN attribute from it.

TRUNCATE triggerd on foreign tables are now supported
commit: 3b00a944

Some foreign data wrappers support the TRUNCATE command. This patch adds the ability to create a trigger on TRUNCATE for foreign tables for such wrappers.

New argument variation for pg_read_file/pg_read_binary_file
commit: 283129e3

The specification of the pg_read_file and the similar pg_read_binary_file functions was as follows:

pg_read_file ( filename text [, offset bigint, length bigint [, missing_ok boolean ]] ) → text

It is assumed that to read the entire file, you do not need to specify the offset and length parameters. But then you can't set missing_ok=true to ignore the file absence.

PostgreSQL 16 adds an overloaded function variant, resulting in a new specification:

pg_read_file ( filename text [, offset bigint, length bigint ] [, missing_ok boolean ] ) → text

Now you can specify a file name and ignore the file absence.

Extensible WAL resource managers
commit: 5c279a6d

Please welcome to the stage my colleague Egor Rogov, who will explain the highlight of this patch.

One of the problems facing developers of table and index access methods is the inability to generate log records of a specific type. There is the mechanism of generic WAL records that logs the “difference” between the old and new page states to WAL (it is used, for example, by the rum extension), but it is not very efficient and does not support logical replication.

In PostgreSQL 16, extensions will be able to create their own resource managers and, consequently, log entries in their own format. The flip side of the coin is that recovering an instance after a failure becomes dependent on a third-party extension.

From the user's point of view, the patch does not change anything, but it is another important step towards the emergence of new access methods.

Хабы:

PostgreSQL 16: Part 1 or CommitFest 2022-07

Notable features

Публикации

Информация