erogov Jan 29 2020 at 18:28

MVCC in PostgreSQL-7. Autovacuum

10 min

2.4K

Postgres Professional corporate blogPostgreSQL*SQL*

Translation

To remind you, we started with problems related to isolation, made a digression about low-level data structure, discussed row versions in detail and observed how data snapshots are obtained from row versions.

Then we explored in-page vacuum (and HOT updates) and vacuum. Now we'll look into autovacuum.

Autovacuum

We've already mentioned that normally (i. e., when nothing holds the transaction horizon for a long time) VACUUM usually does its job. The problem is how often to call it.

If we vacuum a changing table too rarely, its size will grow more than desired. Besides, a next vacuum operation may require several passes through indexes if too many changes were done.

If we vacuum the table too often, the server will constantly do maintenance rather than useful work — and this is no good either.

Note that launching VACUUM on schedule by no means resolves the issue because the workload can change with time. If the table starts to change more intensively, it must be vacuumed more often.

Autovacuum is exactly the technique that enables us to launch vacuuming depending on how intensive the table changes are.

When autovacuum is turned on (the autovacuum configuration parameter set), the autovacuum launcher daemon process is started, which plans the work. Vacuuming itself is done by autovacuum worker processes, several instances of which can run in parallel.

The autovacuum launcher process composes a list of databases where any activity takes place. The activity is determined from statistics, and to collect it, the track_counts parameter must be set. Never turn off autovacuum and track_counts, otherwise, the autovacuum feature won't work.

Once every autovacuum_naptime seconds, autovacuum launcher starts (using the postmaster process) a worker process for each database on the list. In other words, if there is some activity in a database, worker processes will be sent to it at an interval of autovacuum_naptime seconds. To this end, if a few (N) active databases are available, worker processes are launched N times as often as every autovacuum_naptime seconds. But the total number of simultaneously running worker processes is limited by the autovacuum_max_workers parameter.

When started, a worker process connects to the database assigned to it and starts with composing a list of:

All the tables, materialized views and TOAST tables that require vacuuming.
All tables and materialized views that require analysis (TOAST tables are not analyzed since they are always reached with index access).

Then the worker process vacuums and/or analyzes objects on the list one at a time and completes when vacuuming is finished.

If the process could not do all the work planned in autovacuum_naptime seconds, the autovacuum launcher process will send one more worker process to this database, and they will work together. «Together» just means that the second process will build its own list and work through it. So, only different tables will be processed in parallel, but there is no parallelism at the level of one table — if one of the worker processes is already handling a table, another process will skip it and proceed further.

Now let's clarify in more detail what is meant by «requires vacuuming» and «requires analysis».

Recently the patch was committed that allows the vacuum to process indexes in parallel with background workers.

What tables require vacuuming?

Vacuuming is considered to be required if the number of dead (i. e., outdated) tuples exceeds the specified threshold. The statistics collector is permanently keeping track of the number of dead tuples, which is stored in the pg_stat_all_tables table. And two parameters specify the threshold:

autovacuum_vacuum_threshold defines an absolute value (the number of tuples).
autovacuum_vacuum_scale_factor defines the share of rows in the table.

In summary: vacuuming is required if pg_stat_all_tables.n_dead_tup >= autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * pg_class.reltupes.

With the default settings, autovacuum_vacuum_threshold = 50 and autovacuum_vacuum_scale_factor = 0.2. autovacuum_vacuum_scale_factor is, certainly, the most important here — it's this parameter that is critical for large tables (and it's them that possible issues are associated with). The value of 20% seems unduly high, and most likely it will need to be considerably reduced.

Optimal values of the parameters may vary for different tables and depend on the table sizes and specifics of the changes. It makes sense to set generally suitable values and, if the need arises, do special tweaking of the parameters at the level of certain tables by means of storage parameters:

autovacuum_vacuum_threshold and toast.autovacuum_vacuum_threshold.
autovacuum_vacuum_scale_factor and toast.autovacuum_vacuum_scale_factor.

To avoid getting confused, this is reasonable to do only for few tables that are distinguished among the rest by the amount and intensity of changes and only when the globally set values fail to work fine.

Besides, you can turn autovacuum off at the table level (although we can hardly think of a reason why it could be necessary):

autovacuum_enabled and toast.autovacuum_enabled.

For example, last time we created the vac table with autovacuum turned off in order to manually control vacuuming for demo purposes. The storage parameter can be changed as follows:

=> ALTER TABLE vac SET (autovacuum_enabled = off);

To formalize all the above, let's create a view that shows which tables need vacuuming at this point in time. It will use the function that returns the current value of the parameter and takes into account that the value can be redefined at the table level:

=> CREATE FUNCTION get_value(param text, reloptions text[], relkind "char")
RETURNS float
AS $$
  SELECT coalesce(
    -- if the storage parameter is set, we take its value
    (SELECT option_value
     FROM   pg_options_to_table(reloptions)
     WHERE  option_name = CASE
              -- for TOAST tables, the parameter name differs
              WHEN relkind = 't' THEN 'toast.' ELSE ''
            END || param
    ),
    -- otherwise, we take the value of the configuration parameter
    current_setting(param)
  )::float;
$$ LANGUAGE sql;

And this is the view:

=> CREATE VIEW need_vacuum AS
  SELECT st.schemaname || '.' || st.relname tablename,
         st.n_dead_tup dead_tup,
         get_value('autovacuum_vacuum_threshold', c.reloptions, c.relkind) +
         get_value('autovacuum_vacuum_scale_factor', c.reloptions, c.relkind) * c.reltuples
         max_dead_tup,
         st.last_autovacuum
  FROM   pg_stat_all_tables st,
         pg_class c
  WHERE  c.oid = st.relid
  AND    c.relkind IN ('r','m','t');

What tables require analysis?

The situation with automatic analysis is similar. Those tables are considered to require analysis whose number of updated (since the last analysis) tuples exceeds the threshold specified by two similar parameters: pg_stat_all_tables.n_mod_since_analyze >= autovacuum_analyze_threshold + autovacuum_analyze_scale_factor * pg_class.reltupes.

The default settings of automatic analysis are somewhat different: autovacuum_analyze_threshold = 50 and autovacuum_analyze_scale_factor = 0.1. They can also be defined at the level of storage parameters of separate tables:

autovacuum_analyze_threshold
autovacuum_analyze_scale_factor

Since TOAST tables are not analyzed, they do not have such parameters.

Let's also create a view for analysis:

=> CREATE VIEW need_analyze AS
  SELECT st.schemaname || '.' || st.relname tablename,
         st.n_mod_since_analyze mod_tup,
         get_value('autovacuum_analyze_threshold', c.reloptions, c.relkind) +
         get_value('autovacuum_analyze_scale_factor', c.reloptions, c.relkind) * c.reltuples
         max_mod_tup,
         st.last_autoanalyze
  FROM   pg_stat_all_tables st,
         pg_class c
  WHERE  c.oid = st.relid
  AND    c.relkind IN ('r','m');

Example

Let's set the following parameter values for experiments:

=> ALTER SYSTEM SET autovacuum_naptime = '1s'; -- to aviod waiting long
=> ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.03;  -- 3%
=> ALTER SYSTEM SET autovacuum_vacuum_threshold = 0;
=> ALTER SYSTEM SET autovacuum_analyze_scale_factor = 0.02; -- 2%
=> ALTER SYSTEM SET autovacuum_analyze_threshold = 0;

=> SELECT pg_reload_conf();

 pg_reload_conf
----------------
 t
(1 row)

Now let's create a table similar to the one used last time and insert one thousand rows into it. Autovacuum is turned off at the table level, and we will be turning it on by ourselves. Without this, the examples will not be reproducible since autovacuuming can be triggered at a bad time.

=> CREATE TABLE autovac(
  id serial,
  s char(100)
) WITH (autovacuum_enabled = off);
=> INSERT INTO autovac SELECT g.id,'A' FROM generate_series(1,1000) g(id);

This is what our view for vacuuming will show:

=> SELECT * FROM need_vacuum WHERE tablename = 'public.autovac';

   tablename    | dead_tup | max_dead_tup | last_autovacuum 
----------------+----------+--------------+-----------------
 public.autovac |        0 |            0 | 
(1 row)

Attention here should be given to two things. First, max_dead_tup = 0 although 3% of 1000 rows make 30 rows. The thing is that we do not have statistics on the table yet since INSERT does not update it on its own. Until the table gets analyzed, zeros will remain since pg_class.reltuples = 0. But let's look at the second view for analysis:

=> SELECT * FROM need_analyze WHERE tablename = 'public.autovac';

   tablename    | mod_tup | max_mod_tup | last_autoanalyze 
----------------+---------+-------------+------------------
 public.autovac |    1000 |           0 | 
(1 row)

Since 1000 rows have been changed (added) in the table, which is greater than zero, automatic analysis must be triggered. Let's check this:

=> ALTER TABLE autovac SET (autovacuum_enabled = on);

After a short pause we can see that the table has been analyzed and correct 20 rows are shown in max_dead_tup instead of zeros:

=> SELECT * FROM need_analyze WHERE tablename = 'public.autovac';

   tablename    | mod_tup | max_mod_tup |       last_autoanalyze        
----------------+---------+-------------+-------------------------------
 public.autovac |       0 |          20 | 2019-05-21 11:59:48.465987+03
(1 row)

=> SELECT reltuples, relpages FROM pg_class WHERE relname = 'autovac';

 reltuples | relpages 
-----------+----------
      1000 |       17
(1 row)

Let's get back to autovacuuming:

=> SELECT * FROM need_vacuum WHERE tablename = 'public.autovac';

   tablename    | dead_tup | max_dead_tup | last_autovacuum
----------------+----------+--------------+-----------------
 public.autovac |        0 |           30 |
(1 row)

As we can see, max_dead_tup has already been fixed. Another thing to pay attention to is that dead_tup = 0. The statistics show that the table does not have dead tuples..., and this is true. There is nothing to vacuum in the table yet. Any table used exclusively in append-only mode will not be vacuumed and therefore, the visibility map won't be updated for it. But this makes use of index-only scan impossible.

(Next time we will see that vacuuming will sooner or later reach an append-only table, but this will happen too rarely.)

A lesson learned: if index-only scan is critical, it may be required to manually call a vacuum process.

Now let's turn autovacuum off again and update 31 lines, which is one line greater that the threshold.

=> ALTER TABLE autovac SET (autovacuum_enabled = off);
=> UPDATE autovac SET s = 'B' WHERE id <= 31;
=> SELECT * FROM need_vacuum WHERE tablename = 'public.autovac';

   tablename    | dead_tup | max_dead_tup | last_autovacuum 
----------------+----------+--------------+-----------------
 public.autovac |       31 |           30 | 
(1 row)

Now the condition of vacuum triggering is met. Let's turn autovacuum on and after a short pause we will see that the table has been processed:

=> ALTER TABLE autovac SET (autovacuum_enabled = on);
=> SELECT * FROM need_vacuum WHERE tablename = 'public.autovac';

   tablename    | dead_tup | max_dead_tup |        last_autovacuum        
----------------+----------+--------------+-------------------------------
 public.autovac |        0 |           30 | 2019-05-21 11:59:52.554571+03
(1 row)

Load throttling

VACUUM does not block other processes since it works page by page, but it does produce additional load on the system and can considerably affect the performance.

Throttling for VACUUM

To be able to control the vacuum intensity and therefore, its effect on the system, the process alternates work and waiting. The process will do about vacuum_cost_limit conventional units of work and then it will sleep for vacuum_cost_delay ms.

The default settings are vacuum_cost_limit = 200 and vacuum_cost_delay = 0. The last zero actually means that VACUUM does not sleep, so a specific value of vacuum_cost_limitdoes not matter at all. The reasoning behind this is that if an administrator did have to manually launch VACUUM, he is likely to wish vacuuming to be done as fast as possible.

Nevertheless, if we do set the sleeping time, the amount of work specified in vacuum_cost_limit will be composed of the costs of work with pages in the buffer cache. Each page access is estimated as follows:

If the page is found in the buffer cache, vacuum_cost_page_hit = 1.
If it was not found, vacuum_cost_page_miss = 10.
If it was not found and a dirty page had to be additionally evicted from the buffer cache, vacuum_cost_page_dirty = 20.

That is, with the default settings of vacuum_cost_limit, 200 cache pages or 20 disk pages or 10 pages with eviction can be processed in one go. It's clear that these figures are pretty tentative, but it does not make sense to select more accurate ones.

Throttling for autovacuuming

For vacuum processes, load throttling works the same way as for VACUUM. But for autovacuum processes and manually launched VACUUM to work with different intensity, autovacuum has its own parameters: autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay. If these parameters have the value of -1, the value of vacuum_cost_limit and/or vacuum_cost_delay is used.

By default autovacuum_vacuum_cost_limit = -1 (i. e., the value of vacuum_cost_limit = 200 is used) and autovacuum_vacuum_cost_delay = 20 ms. On modern hardware, autovacuum will be really slow.

In version 12, the value of autovacuum_vacuum_cost_delay is reduced to 2 ms, which can be taken for a more appropriate first approximation.

Besides, we should note that the limit specified by these settings is common for all worker processes. In other words, when the number of simultaneous worker processes is changed, the overall load remains unchanged. So, to increase the autovacuum performance, when adding worker processes, it makes sense to also increase autovacuum_vacuum_cost_limit.

Use of memory and monitoring

Last time we observed how VACUUM used RAM of size maintenance_work_mem to store tids to be vacuumed.

Autovacuum does absolutely the same. But there can be many simultaneous worker processes if autovacuum_max_workers is set to a large value. Moreover, all the memory is allocated at once rather than as the need arises. Therefore, for a worker process, its own limitation can be set by means of the autovacuum_work_mem parameter. The default value of this parameter is -1, i. e., it is not used.

As already mentioned, VACUUM can also work with a minimum memory size. But if indexes are created on the table, a small value of maintenance_work_mem can entail repeated index scans. The same is true for autovacuum. Ideally, autovacuum_work_mem should have a minimum value such that no repeated scans occur.

We've seen that to monitor VACUUM, the VERBOSE option can be used (which cannot be specified for autovacuum) or the pg_stat_progress_vacuum view (which, however, shows only the current information). Therefore, the main means to monitor autovacuuming is to use the log_autovacuum_min_duration parameter, which outputs the information to the server message log. It is turned off by default (set to -1). It is reasonable to turn this parameter on (with the value of 0, information on all autovacuum runs will be output) and watch the figures.

This is what the output information looks like:

=> ALTER SYSTEM SET log_autovacuum_min_duration = 0;
=> SELECT pg_reload_conf();

 pg_reload_conf 
----------------
 t
(1 row)

=> UPDATE autovac SET s = 'C' WHERE id <= 31;

student$ tail -n 7 /var/log/postgresql/postgresql-11-main.log

2019-05-21 11:59:55.675 MSK [9737] LOG:  automatic vacuum of table "test.public.autovac": index scans: 0
	pages: 0 removed, 18 remain, 0 skipped due to pins, 0 skipped frozen
	tuples: 31 removed, 1000 remain, 0 are dead but not yet removable, oldest xmin: 4040
	buffer usage: 78 hits, 0 misses, 0 dirtied
	avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
	system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2019-05-21 11:59:55.676 MSK [9737] LOG:  automatic analyze of table "test.public.autovac" system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

All the necessary information is available here.

To remind you, it often makes sense to lower the threshold for vacuum triggering in order to process less data at a time rather than increase the memory size.

It may also be reasonable to use the above views to monitor the length of the list of tables that require vacuuming. Increase of the list length will indicate that the autovacuum processes lack time to do their job and the settings need to be changed.

Read on.

Tags:

Hubs: