Постановка задачи
Для оптимизации запросов PostgreSQL, очень требуется возможность анализировать историю активности, в частности – ожидания, блокировки, статистика таблиц.
Имеющиеся возможности
Инструмент анализа исторической нагрузки или «AWR для Postgres»: очень интересное решение, но, нет истории pg_stat_activity и pg_locks.
Расширение pgsentinel :
"Вся накопленная информация хранится только в оперативной памяти, а потребляемый объём памяти регулируется количеством последних хранимых записей.
Добавляется поле queryid — тот самый queryid из расширения pg_stat_statements (требуется предварительная установка)."
Это конечно сильно бы помогло, но самая неприятность именно первый пункт “Вся накопленная информация хранится только в оперативной памяти ”, т.е. имеем место импакт на целевую базу. К тому, же нет истории блокировок и статистики таблиц. Т.е. решение вообще говоря неполное: “Готового пакета для установки пока нет. Предлагается скачать исходники и собрать библиотеку самостоятельно. Предварительно требуется установить «devel»-пакет для своего сервера и в переменную PATH прописать путь до pg_config.”.
В общем – возни много, а в случае серьезных продакшн баз, может быть, и не будет возможности что-то делать с сервером. Нужно опять, придумывать, что-то свое.
Предупреждение.
В связи с новизной темы и незавершением периода тестирования, статья носит в основном ознакомительный характер, скорее как набор тезисов и промежуточных результатов.
Более подробный материал, будет подготовлен позже, по частям
Эскизные требования к решению
Необходимо разработать инструмент позволяющий хранить:
Историю представления pg_stat_activity
Историю блокировок сессий, используя представление pg_locks
Требование к решению–минимизировать влияние на целевую базу данных.
Общая идея–агент сбора данных запускается не в целевой базе, а в базе данных мониторинга как сервис systemd. Да, возможны некоторые потери данных, но это не критично, для отчетности, зато нет импакта на целевую базу по памяти и месту на диске. А в случае использования пула соединений, импакт по пользовательским процессам минимален.
Этапы реализации
1.Сервисные таблицы
Для хранения таблиц используется отдельная схема, что бы не затруднять анализ основных использованных таблиц.
DROP SCHEMA IF EXISTS activity_hist ;
CREATE SCHEMA activity_hist AUTHORIZATION monitor ;
Важно: схема создается не в целевой базе данных, а в базе данных мониторинга.
История представления pg_stat_activity
Для хранения текущих снимков представления pg_stat_activity используется таблица
activity_hist.history_pg_stat_activity :
--ACTIVITY_HIST.HISTORY_PG_STAT_ACTIVITY
DROP TABLE IF EXISTS activity_hist.history_pg_stat_activity;
CREATE TABLE activity_hist.history_pg_stat_activity
(
timepoint timestamp without time zone ,
datid oid ,
datname name ,
pid integer,
usesysid oid ,
usename name ,
application_name text ,
client_addr inet ,
client_hostname text ,
client_port integer,
backend_start timestamp with time zone ,
xact_start timestamp with time zone ,
query_start timestamp with time zone ,
state_change timestamp with time zone ,
wait_event_type text ,
wait_event text ,
state text ,
backend_xid xid ,
backend_xmin xid ,
query text ,
backend_type text ,
queryid bigint
);
Для ускорения вставки – никаких индексов и ограничений.
Для хранения непосредственно истории используется секционированная таблица:
activity_hist.archive_pg_stat_activity :
DROP TABLE IF EXISTS activity_hist.archive_pg_stat_activity;
CREATE TABLE activity_hist.archive_pg_stat_activity
(
timepoint timestamp without time zone ,
datid oid ,
datname name ,
pid integer,
usesysid oid ,
usename name ,
application_name text ,
client_addr inet ,
client_hostname text ,
client_port integer,
backend_start timestamp with time zone ,
xact_start timestamp with time zone ,
query_start timestamp with time zone ,
state_change timestamp with time zone ,
wait_event_type text ,
wait_event text ,
state text ,
backend_xid xid ,
backend_xmin xid ,
query text ,
backend_type text ,
queryid bigint
)
PARTITION BY RANGE (timepoint);
Поскольку в данном случае нет требований по скорости вставки, созданы некоторые индексы, для ускорения создания отчетов.
История блокировок сессий
Для хранения текущих снимков блокировок сессий используется таблица:
activity_hist.history_locking :
--ACTIVITY_HIST.HISTORY_LOCKING
DROP TABLE IF EXISTS activity_hist.history_locking;
CREATE TABLE activity_hist.history_locking
(
timepoint timestamp without time zone ,
locktype text ,
relation oid ,
mode text ,
tid xid ,
vtid text ,
pid integer ,
blocking_pids integer[] ,
granted boolean
);
Также, для ускорения вставки – никаких индексов и ограничений.
Для хранения непосредственно истории используется секционированная таблица:
activity_hist.archive_locking:
DROP TABLE IF EXISTS activity_hist.archive_locking;
CREATE TABLE activity_hist.archive_locking
(
timepoint timestamp without time zone ,
locktype text ,
relation oid ,
mode text ,
tid xid ,
vtid text ,
pid integer ,
blocking_pids integer[] ,
granted boolean
)
PARTITION BY RANGE (timepoint);
Поскольку в данном случае нет требований по скорости вставки, созданы некоторые индексы, для ускорения создания отчетов.
2.Заполнение текущей истории
Для непосредственного сбора снимков представления используется bash скрипт, запускающий plpgsql функцию.
get_current_activity.sh
#!/bin/bash
#########################################################
#get_current_activity.sh
ERROR_FILE='/home/demon/get_current_activity'$(date +%Y%m%d-)'T'$(date +%H)$(date +%M)$(date +%S)
host=$1
s_name=$2
s_pass=$3
psql -A -t -q -v ON_ERROR_STOP=1 -c "SELECT activity_hist.get_current_activity( '$host' , '$s_name' , '$s_pass' )" >/dev/null 2>$ERROR_FILE
line_count=`cat $ERROR_FILE | wc -l`
if [[ $line_count != '0' ]];
then
rm -f /home/demon/*.err >/dev/null 2>/dev/null
cp $ERROR_FILE $ERROR_FILE'.err' >/dev/null 2>/dev/null
fi
rm $ERROR_FILE >/dev/null 2>/dev/null
exit 0
plpgsql функция по dblink обращается к представлениям в целевой базе данных и вставляет строки в сервисные таблицы в базе данных мониторинга.
get_current_activity.sql
CREATE OR REPLACE FUNCTION activity_hist.get_current_activity( current_host text , current_s_name text , current_s_pass text ) RETURNS BOOLEAN AS $$
DECLARE
database_rec record;
dblink_str text ;
BEGIN
EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||current_host||' port=5432 dbname=postgres'||
' user='||current_s_name||' password='||current_s_pass|| ' '')';
--------------------------------------------------------------------
--GET pg_stat_activity stats
INSERT INTO activity_hist.history_pg_stat_activity
(
SELECT * FROM dblink('LINK1',
'SELECT
now() ,
datid ,
datname ,
pid ,
usesysid ,
usename ,
application_name ,
client_addr ,
client_hostname ,
client_port ,
backend_start ,
xact_start ,
query_start ,
state_change ,
wait_event_type ,
wait_event ,
state ,
backend_xid ,
backend_xmin ,
query ,
backend_type
FROM pg_stat_activity
')
AS t (
timepoint timestamp without time zone ,
datid oid ,
datname name ,
pid integer,
usesysid oid ,
usename name ,
application_name text ,
client_addr inet ,
client_hostname text ,
client_port integer,
backend_start timestamp with time zone ,
xact_start timestamp with time zone ,
query_start timestamp with time zone ,
state_change timestamp with time zone ,
wait_event_type text ,
wait_event text ,
state text ,
backend_xid xid ,
backend_xmin xid ,
query text ,
backend_type text
)
);
---------------------------------------
--ACTIVITY_HIST.HISTORY_LOCKING
INSERT INTO activity_hist.history_locking
(
SELECT * FROM dblink('LINK1',
'SELECT
now() ,
lock.locktype,
lock.relation,
lock.mode,
lock.transactionid as tid,
lock.virtualtransaction as vtid,
lock.pid,
pg_blocking_pids(lock.pid),
lock.granted
FROM pg_catalog.pg_locks lock LEFT JOIN pg_catalog.pg_database db ON db.oid = lock.database
WHERE NOT lock.pid = pg_backend_pid()
')
AS t (
timepoint timestamp without time zone ,
locktype text ,
relation oid ,
mode text ,
tid xid ,
vtid text ,
pid integer ,
blocking_pids integer[] ,
granted boolean
)
);
PERFORM dblink_disconnect('LINK1');
RETURN TRUE ;
END
$$ LANGUAGE plpgsql;
Для сбора снимков представления используется systemd service, и два скрипта:
pg_current_activity.service
# /etc/systemd/system/pg_current_activity.service
[Unit]
Description=Collect history of pg_stat_activity , pg_locks
Wants=pg_current_activity.timer
[Service]
Type=forking
StartLimitIntervalSec=0
ExecStart=/home/postgres/pgutils/demon/get_current_activity.sh X.X.X.X postgres postgres
[Install]
WantedBy=multi-user.target
pg_current_activity.timer
# /etc/systemd/system/pg_current_activity.timer
[Unit]
Description=Run pg_current_activity.sh every 1 second
Requires=pg_current_activity.service
[Timer]
Unit=pg_current_activity.service
OnCalendar=*:*:0/1
AccuracySec=1
[Install]
WantedBy=timers.target
Назначаем права скриптам:
# chmod 755 pg_current_activity.timer
# chmod 755 pg_current_activity.service
Запускаем сервис:
# systemctl daemon-reload
# systemctl start pg_current_activity.service
Таким образом, история представлений собирается в виде ежесекундных снимков. Конечно, если все оставить как есть, таблицы очень быстро увеличатся в размерах и более менее продуктивная работа станет невозможна.
Нужно организовать архивирование данных.
3.Архивирование истории
Для архивация используется секционированные таблицы archive*.
Новые секции создаются каждый час, при этом старые данные из таблиц history* удаляются, таким образом, размер таблиц history* сильно не меняется и скорость вставки не деградируется со временем.
Создание новых секций выполняется plpgsql функцией activity_hist.archive_current_activity. Алгоритм работы очень прост( на примере секции для таблицы archive_pg_stat_activity).
Создаём и заполняем новую секцию
EXECUTE format(
'CREATE TABLE ' || partition_name ||
' PARTITION OF activity_hist.archive_pg_stat_activity FOR VALUES FROM ( %L ) TO ( %L ) ' ,
to_char(date_trunc('year', partition_min_range ),'YYYY')||'-'||
to_char(date_trunc('month', partition_min_range ),'MM')||'-'||
to_char(date_trunc('day', partition_min_range ),'DD')||' '||
to_char(date_trunc('hour', partition_min_range ),'HH24')||':00',
to_char(date_trunc('year', partition_max_range ),'YYYY')||'-'||
to_char(date_trunc('month', partition_max_range ),'MM')||'-'||
to_char(date_trunc('day', partition_max_range ),'DD')||' '||
to_char(date_trunc('hour', partition_max_range ),'HH24')||':00'
);
INSERT INTO activity_hist.archive_pg_stat_activity
(
SELECT *
FROM activity_hist.history_pg_stat_activity
WHERE timepoint BETWEEN partition_min_range AND partition_max_range
);
Создаем индексы
EXECUTE format (
'CREATE INDEX '||index_name||
' ON '||partition_name||' ( wait_event_type , backend_type , timepoint )'
);
EXECUTE format ('CREATE INDEX '||index_name||
' ON '||partition_name||' ( wait_event_type , backend_type , timepoint , queryid )'
);
Удаляем старые данные из таблицы history_pg_stat_activity
DELETE
FROM activity_hist.history_pg_stat_activity
WHERE timepoint < partition_max_range;
Разумеется периодически, старые секции, за ненадобностью удаляются.
Базовые отчеты
Собственно для чего все это делается. Для получения отчетов очень отдаленно, приближенно напоминающих AWR Oracle.
Важно добавить, для получения отчетов необходимо построить связь между представлениями pg_stat_activity и pg_stat_statements. Таблицы связываются путем добавления в таблицы 'history_pg_stat_activity', 'archive_pg_stat_activity' столбца 'queryid'. Способ добавления значения столбца, выходит за рамки данной статьи и описан здесь — pg_stat_statements + pg_stat_activity + loq_query = pg_ash? .
TOTAL CPU TIME FOR QUERIES
Запрос :
WITH hist AS
(
SELECT
aa.query ,aa.queryid ,
count(*) * interval '1 second' AS duration
FROM activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND backend_type = 'client backend' AND datname != 'postgres' AND ( aa.wait_event_type IS NULL ) ANDaa.state = 'active'
GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid
UNION
SELECT
ha.query ,ha.queryid,
count(*) * interval '1 second' AS duration
FROM activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND backend_type = 'client backend' AND datname != 'postgres' AND ( ha.wait_event_type IS NULL )AND ha.state = 'active'
GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid
)
SELECT query , queryid , SUM( duration ) as duration
FROM hist
GROUP BY query , queryid
ORDER BY 3 DESC
Пример:
-------------------------------------------------------------------
| TOTAL CPU TIME FOR QUERIES : 07:47:36
+----+----------------------------------------+--------------------
| #| queryid| duration
+----+----------------------------------------+--------------------
| 1| 389015618226997618| 04:28:58
| 2| | 01:07:29
| 3| 1237430309438971376| 00:59:38
| 4| 4710212362688288619| 00:50:48
| 5| 28942442626229688| 00:15:50
| 6| 9150846928388977274| 00:04:46
| 7| -6572922443698419129| 00:00:06
| 8| | 00:00:01
+----+----------------------------------------+--------------------
TOTAL WAITINGS TIME FOR QUERIES
Запрос :
WITH hist AS
(
SELECT
aa.query ,aa.queryid ,
count(*) * interval '1 second' AS duration
FROM activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
( aa.wait_event_type IS NOT NULL )
GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid
UNION
SELECT
ha.query ,ha.queryid,
count(*) * interval '1 second' AS duration
FROM activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
( ha.wait_event_type IS NOT NULL )
GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid
)
SELECT query , queryid , SUM( duration ) as duration
FROM hist
GROUP BY query , queryid
ORDER BY 3 DESC
Пример :
-------------------------------------------------------------------
| TOTAL WAITINGS TIME FOR QUERIES : 21:55:04
+----+----------------------------------------+--------------------
| #| queryid| duration
+----+----------------------------------------+--------------------
| 1| 389015618226997618| 16:19:05
| 2| | 03:47:04
| 3| 8085340880788646241| 00:40:20
| 4| 4710212362688288619| 00:13:35
| 5| 9150846928388977274| 00:12:25
| 6| 28942442626229688| 00:11:32
| 7| 1237430309438971376| 00:09:45
| 8| 2649515222348904837| 00:09:37
| 9| | 00:03:45
| 10| 3167065002719415275| 00:02:20
| 11| 5731212217001535134| 00:02:13
| 12| 8304755792398128062| 00:01:31
| 13| 2649515222348904837| 00:00:59
| 14| 2649515222348904837| 00:00:22
| 15| | 00:00:12
| 16| 3422818749220588372| 00:00:08
| 17| -5730801771815999400| 00:00:03
| 18| -1473395109729441239| 00:00:02
| 19| 2404820632950544954| 00:00:02
| 20| -6572922443698419129| 00:00:02
| 21| 2369289265278398647| 00:00:01
| 22| 180077086776069052| 00:00:01
+----+----------------------------------------+--------------------
WAITINGS FOR QUERIES
Запросы:
WITH hist AS
(
SELECT
aa.wait_event_type , aa.wait_event
FROM activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
aa.wait_event IS NOT NULL
GROUP BY aa.wait_event_type , aa.wait_event
UNION
SELECT
ha.wait_event_type , ha.wait_event
FROM activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
ha.wait_event IS NOT NULL
GROUP BY ha.wait_event_type , ha.wait_event
)
SELECT wait_event_type , wait_event
FROM hist
GROUP BY wait_event_type , wait_event
ORDER BY 1 ASC,2 ASC
----------------------------------------------------------------------
WITH hist AS
(
SELECT
aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid ,
count(*) * interval '1 second' AS duration
FROM activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
( aa.wait_event_type = waitings_stat_rec.wait_event_type AND aa.wait_event = waitings_stat_rec.wait_event )
GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid
UNION
SELECT
ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid,
count(*) * interval '1 second' AS duration
FROM activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
( ha.wait_event_type = waitings_stat_rec.wait_event_type AND ha.wait_event = waitings_stat_rec.wait_event )
GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid
)
SELECT query , queryid , SUM( duration ) as duration
FROM hist
GROUP BY query , queryid
ORDER BY 3 DESC
Пример:
------------------------------------------------
| WAITINGS FOR QUERIES
+-----------------------------------------------
| wait_event_type = Client|
| wait_event = ClientRead|
| Total time = 00:46:56|
------------------------------------------------
| #| queryid| duration
+-----+--------------------+--------------------
| 1| 8085340880788646241| 00:40:20
| 2| | 00:03:45
| 3| 5731212217001535134| 00:01:53
| 4| | 00:00:12
| 5| 9150846928388977274| 00:00:09
| 6| 3422818749220588372| 00:00:08
| 7| 1237430309438971376| 00:00:06
| 8| 28942442626229688| 00:00:05
| 9| 4710212362688288619| 00:00:05
| 10|-5730801771815999400| 00:00:03
| 11| 8304755792398128062| 00:00:02
| 12|-6572922443698419129| 00:00:02
| 13|-1473395109729441239| 00:00:02
| 14| 2404820632950544954| 00:00:02
| 15| 180077086776069052| 00:00:01
| 16| 2369289265278398647| 00:00:01
+-----------------------------------------------
| wait_event_type = IO|
| wait_event = BufFileRead|
| Total time = 00:00:38|
------------------------------------------------
| #| queryid| duration
+-----+--------------------+--------------------
| 1| 28942442626229688| 00:00:38
+-----------------------------------------------
LOCKED PROCESSES HISTORY
Запрос:
SELECT
MIN(date_trunc('second',timepoint)) AS started ,
count(*) * interval '1 second' as duration ,
pid , blocking_pids , relation , mode , locktype
FROM
activity_hist.archive_locking al
WHERE
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
NOT granted AND
locktype = 'relation'
GROUP BY pid , blocking_pids , relation , mode , locktype
UNION
SELECT
MIN(date_trunc('second',timepoint)) AS started ,
count(*) * interval '1 second' as duration ,
pid , blocking_pids , relation , mode , locktype
FROM
activity_hist.history_locking
WHERE
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
NOT granted AND
locktype = 'relation'
GROUP BY pid , blocking_pids , relation , mode , locktype
ORDER BY 1
Пример:
------------------------------------------------------------------------------------------------------------------------------------- | LOCKED PROCESSES HISTORY +-----+----------+--------------------+----------+--------------------+--------------------+--------------------+-------------------- | #| pid| started| duration| blocking_pids| relation| mode| locktype +-----+----------+--------------------+----------+--------------------+--------------------+--------------------+-------------------- | 1| 26224| 2019-09-02 19:32:16| 00:01:45| {26211}| 16541| AccessShareLock| relation | 2| 26390| 2019-09-02 19:34:03| 00:00:53| {26211}| 16541| AccessShareLock| relation | 3| 26391| 2019-09-02 19:34:03| 00:00:53| {26211}| 16541| AccessShareLock| relation | 4| 26531| 2019-09-02 19:35:27| 00:00:12| {26211}| 16541| AccessShareLock| relation | 5| 27284| 2019-09-02 19:44:02| 00:00:19| {27276}| 16541| AccessShareLock| relation | 6| 27283| 2019-09-02 19:44:02| 00:00:19| {27276}| 16541| AccessShareLock| relation | 7| 27286| 2019-09-02 19:44:02| 00:00:19| {27276}| 16541| AccessShareLock| relation | 8| 27423| 2019-09-02 19:45:24| 00:00:12| {27394}| 16541| AccessShareLock| relation | 9| 27648| 2019-09-02 19:48:06| 00:00:20| {27647}| 16541| AccessShareLock| relation | 10| 27650| 2019-09-02 19:48:06| 00:00:20| {27647}| 16541| AccessShareLock| relation | 11| 27735| 2019-09-02 19:49:08| 00:00:06| {27650}| 16541| AccessExclusiveLock| relation | 12| 28380| 2019-09-02 19:56:03| 00:01:56| {28379}| 16541| AccessShareLock| relation | 13| 28379| 2019-09-02 19:56:03| 00:00:01| 28377| 16541| AccessExclusiveLock| relation | | | | | 28376| |
BLOCKING PROCESSES HISTORY
Запросы:
SELECT
blocking_pids
FROM
activity_hist.archive_locking al
WHERE
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
NOT granted AND
locktype = 'relation'
GROUP BY blocking_pids
UNION
SELECT
blocking_pids
FROM
activity_hist.history_locking
WHERE
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
NOT granted AND
locktype = 'relation'
GROUP BY blocking_pids
ORDER BY 1
---------------------------------------------------------------
SELECT
pid , usename , application_name , datname ,
MIN(date_trunc('second',timepoint)) as started ,
count(*) * interval '1 second' as duration ,
state ,
query
FROM activity_hist.archive_pg_stat_activity
WHERE pid= current_pid AND
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour')
GROUP BY pid , usename , application_name ,
datname ,
state_change,
state ,
query
UNION
SELECT
pid , usename , application_name , datname ,
MIN(date_trunc('second',timepoint)) as started ,
count(*) * interval '1 second' as duration ,
state ,
query
FROM activity_hist.history_pg_stat_activity_for_reports
WHERE pid= current_pid AND
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour')
GROUP BY pid , usename , application_name ,
datname ,
state_change,
state ,
query
ORDER BY 5 , 1
Пример:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- BLOCKING PROCESSES HISTORY +----+----------+----------+--------------------+----------+--------------------+--------------------+------------------------------+---------------------------------------- | #| pid| usename| application_name| datname| started| duration| state| query +----+----------+----------+--------------------+----------+--------------------+--------------------+------------------------------+---------------------------------------- | 1| 26211| tuser| psql| tdb1| 2019-09-02 19:31:54| 00:00:04| idle| | 2| 26211| tuser| psql| tdb1| 2019-09-02 19:31:58| 00:00:06| idle in transaction| begin; | 3| 26211| tuser| psql| tdb1| 2019-09-02 19:32:16| 00:01:45| idle in transaction| lock table wafer_data; | 4| 26211| tuser| psql| tdb1| 2019-09-02 19:35:54| 00:01:23| idle| commit; | 5| 26211| tuser| psql| tdb1| 2019-09-02 19:38:46| 00:00:02| idle in transaction| begin; | 6| 26211| tuser| psql| tdb1| 2019-09-02 19:38:54| 00:00:08| idle in transaction| lock table wafer_data; | 7| 26211| tuser| psql| tdb1| 2019-09-02 19:39:08| 00:42:42| idle| commit; | 8| 26211| tuser| psql| tdb1| 2019-09-03 07:12:07| 00:00:52| active| select test_del ();
Развитие.
Показанные базовые запросы и получаемые отчеты, уже сильно облегчают жизнь при анализе инцидентов производительности.
На основе базовых запросов, можно получить отчет, отдаленно приближенно напоминающий AWR Oracle.
Пример сводного отчета
+------------------------------------------------------------------------------------ | CONSOLIDATED REPORT FOR ACTIVITY AND WAITINGS . DATETIME : 03.09.2019 14:08 |------------------------------------------------------------------------------------ | HOST :X.X.X.X | BEGIN_SNAPSHOT :02.09.2019 14:08 END_SNAPSHOT : 03.09.2019 14:00 |------------------------------------------------------------------------------------ | CURRENT DATABASES SIZE : | DATABASE :monitor | SIZE (MB) :1370.00 |------------------------------------------------------------------------------------ | CLUSTER CPU TIME : 19:44:22 | CLUSTER WAITINGS TIME : 78:49:16 | | SQL DBTIME : 65:53:09 | SQL CPU TIME : 19:05:21 | SQL WAITINGS TIME : 21:50:46 | SQL IOTIME : 20:53:00 | SQL READ TIME : 20:52:55 | SQL WRITE TIME : 00:00:05 | | SQL CALLS : 311293 ------------------------------------------------------------- | SQL SHARED BLOCKS READS : 13351563334 | SQL SHARED BLOCKS HITS : 2775427045 | SQL SHARED BLOCKS HITS/READS % : 20.79 | SQL SHARED BLOCKS DIRTED : 21105 | SQL SHARED BLOCKS WRITTEN : 3656 | | SQL TEMPORARY BLOCKS READS : 7464932 | SQL TEMPORARY BLOCKS WRITTEN : 10176024 ------------------------------------------------------------- | | WAITINGS STATICTICS | +------------------------------------------------------------------------------------ | TOP 10 WAITINGS BY TOTAL WAIT TIME FOR SYSTEM PROCESSES +-----+------------------------------+--------------------+-------------------- | #| wait_event_type| wait_event| duration +-----+------------------------------+--------------------+-------------------- | 1| Activity| LogicalLauncherMain| 11:21:01 | 2| Activity| CheckpointerMain| 11:20:35 | 3| Activity| AutoVacuumMain| 11:20:31 | 4| Activity| WalWriterMain| 11:19:35 | 5| Activity| BgWriterMain| 10:14:19 | 6| Activity| BgWriterHibernate| 01:06:04 | 7| Activity| WalSenderMain| 00:04:05 | 8| Client| ClientWrite| 00:04:00 | 9| IO| BufFileWrite| 00:02:45 | 10| LWLock| buffer_mapping| 00:02:14 +-----+------------------------------+--------------------+-------------------- | TOP 10 WAITINGS BY TOTAL WAIT TIME FOR CLIENTS PROCESSES +-----+------------------------------+--------------------+--------------------+---------- | #| wait_event_type| wait_event| duration| % dbtime +-----+------------------------------+--------------------+--------------------+---------- | 1| Lock| transactionid| 11:55:37| 18.1 | 2| IO| DataFileRead| 07:19:43| 11.12 | 3| Client| ClientRead| 00:46:54| 1.19 | 4| Lock| relation| 00:40:37| 1.03 | 5| LWLock| buffer_mapping| 00:31:08| 0.79 | 6| LWLock| buffer_io| 00:22:12| 0.56 | 7| Timeout| PgSleep| 00:10:58| 0.28 | 8| Lock| tuple| 00:01:30| 0.04 | 9| IO| BufFileWrite| 00:01:16| 0.03 | 10| IO| BufFileRead| 00:00:37| 0.02 +-----+------------------------------+--------------------+--------------------+---------- | WAITINGS TYPES BY TOTAL WAIT TIME, FOR SYSTEM PROCESSES +-----+------------------------------+-------------------- | #| wait_event_type| duration +-----+------------------------------+-------------------- | 1| Activity| 56:46:10 | 2| IO| 00:05:13 | 3| Client| 00:04:00 | 4| LWLock| 00:03:07 +-----+------------------------------+-------------------- | WAITINGS TYPES BY TOTAL WAIT TIME, FOR CLIENTS PROCESSES +-----+------------------------------+--------------------+-------------------- | #| wait_event_type| duration| % dbtime +-----+------------------------------+--------------------+-------------------- | 1| Lock| 12:37:44| 19.17 | 2| IO| 07:21:40| 11.17 | 3| LWLock| 00:53:26| 1.35 | 4| Client| 00:46:54| 1.19 | 5| Timeout| 00:10:58| 0.28 | 6| IPC| 00:00:04| 0 +-----+------------------------------+--------------------+-------------------- | WAITINGS FOR SYSTEM PROCESSES +-----+-----------------------------+----------+--------------------+----------------------+-------------------- | #| backend_type| dbname| wait_event_type| wait_event| duration +-----+-----------------------------+----------+--------------------+----------------------+-------------------- | 1| logical replication launcher| | Activity| LogicalLauncherMain| 11:21:01 | 2| checkpointer| | Activity| CheckpointerMain| 11:20:35 | 3| autovacuum launcher| | Activity| AutoVacuumMain| 11:20:31 | 4| walwriter| | Activity| WalWriterMain| 11:19:35 | 5| background writer| | Activity| BgWriterMain| 10:14:19 | 6| background writer| | Activity| BgWriterHibernate| 01:06:04 | 7| walsender| | Activity| WalSenderMain| 00:04:05 | 8| walsender| | Client| ClientWrite| 00:04:00 | 9| parallel worker| tdb1| IO| BufFileWrite| 00:02:45 | 10| parallel worker| tdb1| LWLock| buffer_mapping| 00:02:05 | 11| parallel worker| tdb1| IO| DataFileRead| 00:01:10 | 12| parallel worker| tdb1| IO| BufFileRead| 00:01:05 | 13| parallel worker| tdb1| LWLock| buffer_io| 00:00:45 | 14| autovacuum worker| tdb1| LWLock| buffer_mapping| 00:00:09 | 15| walwriter| | IO| WALWrite| 00:00:08 | 16| walwriter| | LWLock| WALWriteLock| 00:00:04 | 17| background writer| | LWLock| WALWriteLock| 00:00:03 | 18| background writer| | IO| WALWrite| 00:00:02 | 19| background writer| | IO| DataFileWrite| 00:00:02 | 20| checkpointer| | IO| ControlFileSyncUpdate| 00:00:01 | 21| autovacuum worker| tdb1| LWLock| buffer_io| 00:00:01 +-----+-----------------------------+----------+--------------------+----------------------+-------------------- | WAITINGS FOR SQL +-----+-------------------------+----------+--------------------+--------------------+--------------------+---------- | #| queryid| dbname| wait_event_type| wait_event| duration| % dbtime +-----+-------------------------+----------+--------------------+--------------------+--------------------+---------- | 1| 389015618226997618| tdb1| Lock| transactionid| 09:47:43| 14.87 | 2| 389015618226997618| tdb1| IO| DataFileRead| 05:47:07| 8.78 | 3| | tdb1| Lock| transactionid| 02:07:54| 3.24 | 4| | tdb1| IO| DataFileRead| 01:30:24| 2.29 | 5| 8085340880788646241| tdb1| Client| ClientRead| 00:40:20| 1.02 | 6| 389015618226997618| tdb1| LWLock| buffer_mapping| 00:20:41| 0.52 | 7| 389015618226997618| tdb1| LWLock| buffer_io| 00:17:30| 0.44 | 8| 2649515222348904837| tdb1| Timeout| PgSleep| 00:10:58| 0.28 | 9| 4710212362688288619| tdb1| Lock| relation| 00:10:44| 0.27 | 10| 9150846928388977274| tdb1| Lock| relation| 00:10:24| 0.26 | 11| 28942442626229688| tdb1| Lock| relation| 00:07:48| 0.2 | 12| 1237430309438971376| tdb1| Lock| relation| 00:07:32| 0.19 | 13| | tdb1| LWLock| buffer_mapping| 00:04:32| 0.11 | 14| | tdb1| LWLock| buffer_io| 00:04:13| 0.11 | 15| | tdb1| Client| ClientRead| 00:03:57| 0.1 | 16| 4710212362688288619| tdb1| LWLock| buffer_mapping| 00:02:26| 0.06 | 17| 3167065002719415275| tdb1| Lock| relation| 00:02:20| 0.06 | 18| 5731212217001535134| tdb1| Client| ClientRead| 00:01:53| 0.05 | 19| 1237430309438971376| tdb1| LWLock| buffer_mapping| 00:01:42| 0.04 | 20| 389015618226997618| tdb1| Lock| tuple| 00:01:30| 0.04 | 21| 8304755792398128062| tdb1| Lock| relation| 00:01:29| 0.04 | 22| 28942442626229688| tdb1| IO| BufFileWrite| 00:01:16| 0.03 | 23| 9150846928388977274| tdb1| IO| DataFileRead| 00:01:07| 0.03 | 24| 28942442626229688| tdb1| LWLock| buffer_mapping| 00:01:03| 0.03 | 25| 9150846928388977274| tdb1| LWLock| buffer_mapping| 00:00:44| 0.02 | 26| 28942442626229688| tdb1| IO| BufFileRead| 00:00:37| 0.02 | 27| 28942442626229688| tdb1| LWLock| buffer_io| 00:00:25| 0.01 | 28| 1237430309438971376| tdb1| IO| DataFileRead| 00:00:24| 0.01 | 29| 28942442626229688| tdb1| IO| DataFileRead| 00:00:22| 0.01 | 30| 5731212217001535134| tdb1| Lock| relation| 00:00:20| 0.01 | 31| 4710212362688288619| tdb1| IO| DataFileRead| 00:00:19| 0.01 | 32| 9150846928388977274| tdb1| Client| ClientRead| 00:00:09| 0 | 33| 3422818749220588372| tdb1| Client| ClientRead| 00:00:08| 0 | 34| 1237430309438971376| tdb1| Client| ClientRead| 00:00:06| 0 | 35| 389015618226997618| tdb1| LWLock| buffer_content| 00:00:05| 0 | 36| 4710212362688288619| tdb1| Client| ClientRead| 00:00:05| 0 | 37| 4710212362688288619| tdb1| LWLock| buffer_io| 00:00:04| 0 | 38| 28942442626229688| tdb1| Client| ClientRead| 00:00:04| 0 | 39| 28942442626229688| tdb1| IPC| ParallelFinish| 00:00:03| 0 | 40| 389015618226997618| tdb1| IO| DataFileWrite| 00:00:02| 0 | 41| -5730801771815999400| tdb1| Client| ClientRead| 00:00:02| 0 | 42| 2404820632950544954| tdb1| Client| ClientRead| 00:00:02| 0 | 43| -6572922443698419129| tdb1| Client| ClientRead| 00:00:02| 0 | 44| 8304755792398128062| tdb1| Client| ClientRead| 00:00:02| 0 | 45| -1473395109729441239| tdb1| Client| ClientRead| 00:00:02| 0 | 46| | tdb1| LWLock| buffer_content| 00:00:01| 0 | 47| 180077086776069052| tdb1| Client| ClientRead| 00:00:01| 0 | 48| | tdb1| IO| DataFileWrite| 00:00:01| 0 | 49| 28942442626229688| tdb1| IPC| MessageQueueReceive| 00:00:01| 0 | 50| 2369289265278398647| tdb1| Client| ClientRead| 00:00:01| 0 | 51| 9150846928388977274| tdb1| IO| DataFileWrite| 00:00:01| 0 +-----+-------------------------+----------+--------------------+--------------------+--------------------+---------- | | CLIENT SQL STATICTICS | +------------------------------------------------------------------------------------ | CLIENT SQL ordered by Elapsed Time +--------------------+----------+----------+----------+----------+----------+-------------------- | elapsed time| calls| % dbtime| % CPU| % IO| dbname| queryid +--------------------+----------+----------+----------+----------+----------+-------------------- | 06:43:19| 36| 10.2| 85.09| 17.38| tdb1| 389015618226997618 | 02:06:53| 715| 3.21| 0.85| 0.06| tdb1| 1237430309438971376 | 01:52:07| 720| 2.84| 1.19| 0.08| tdb1| 4710212362688288619 | 00:39:03| 357| 0.99| 1.02| 0.33| tdb1| 28942442626229688 | 00:22:00| 8| 0.56| 0.96| 0| tdb1| 2649515222348904837 +--------------------+----------+----------+----------+----------+----------+-------------------- | CLIENT SQL ordered by CPU Time +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | cpu time| calls| % dbtime|total_time| % CPU| % IO| dbname| queryid +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | 16:14:38| 36| 10.2| 06:43:19| 85.09| 17.38| tdb1| 389015618226997618 | 00:13:38| 720| 2.84| 01:52:07| 1.19| 0.08| tdb1| 4710212362688288619 | 00:11:39| 357| 0.99| 00:39:03| 1.02| 0.33| tdb1| 28942442626229688 | 00:10:58| 8| 0.56| 00:22:00| 0.96| 0| tdb1| 2649515222348904837 | 00:09:44| 715| 3.21| 02:06:53| 0.85| 0.06| tdb1| 1237430309438971376 +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | CLIENT SQL ordered by User I/O Wait Time +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | io_wait time| calls| % dbtime|total_time| % CPU| % IO| dbname| queryid +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | 05:47:09| 36| 10.2| 06:43:19| 85.09| 17.38| tdb1| 389015618226997618 | 00:02:15| 357| 0.99| 00:39:03| 1.02| 0.33| tdb1| 28942442626229688 | 00:00:24| 715| 3.21| 02:06:53| 0.85| 0.06| tdb1| 1237430309438971376 | 00:00:19| 720| 2.84| 01:52:07| 1.19| 0.08| tdb1| 4710212362688288619 | 00:00:00| 8| 0.56| 00:22:00| 0.96| 0| tdb1| 2649515222348904837 +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | CLIENT SQL ordered by Shared Buffers Reads +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | buffers reads| calls| % dbtime|total_time| % CPU| % IO| dbname| queryid +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | 2562353244| 36| 10.2| 06:43:19| 85.09| 17.38| tdb1| 389015618226997618 | 11041689| 357| 0.99| 00:39:03| 1.02| 0.33| tdb1| 28942442626229688 | 3303551| 715| 3.21| 02:06:53| 0.85| 0.06| tdb1| 1237430309438971376 | 3242892| 720| 2.84| 01:52:07| 1.19| 0.08| tdb1| 4710212362688288619 | 0| 8| 0.56| 00:22:00| 0.96| 0| tdb1| 2649515222348904837 +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | CLIENT SQL ordered by Disk Reads Time +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | read time| calls| % dbtime|total_time| % CPU| % IO| dbname| queryid +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | 03:37:46| 36| 10.2| 06:43:19| 85.09| 17.38| tdb1| 389015618226997618 | 00:04:07| 357| 0.99| 00:39:03| 1.02| 0.33| tdb1| 28942442626229688 | 00:00:59| 720| 2.84| 01:52:07| 1.19| 0.08| tdb1| 4710212362688288619 | 00:00:42| 715| 3.21| 02:06:53| 0.85| 0.06| tdb1| 1237430309438971376 | 00:00:00| 8| 0.56| 00:22:00| 0.96| 0| tdb1| 2649515222348904837 +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | CLIENT SQL ordered by Executions +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | calls| rows| % dbtime|total_time| % CPU| % IO| dbname| queryid +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | 720| 720| 2.84| 01:52:07| 1.19| 0.08| tdb1| 4710212362688288619 | 715| 715| 3.21| 02:06:53| 0.85| 0.06| tdb1| 1237430309438971376 | 357| 0| 0.99| 00:39:03| 1.02| 0.33| tdb1| 28942442626229688 | 36| 36| 10.2| 06:43:19| 85.09| 17.38| tdb1| 389015618226997618 | 8| 8| 0.56| 00:22:00| 0.96| 0| tdb1| 2649515222348904837 +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | Complete List of SQL Text ---------------------------------------------- …
Продолжение – следует. На очереди формирование истории блокировок (pg_stat_locks), более подробное описание процесса заполнения таблиц.