melanny20 Mar 20 at 13:05

Mastering Data Lifecycle Management: ILM in Postgres Pro Enterprise 17

Medium

6 min

307

Postgres Professional corporate blogPostgreSQL*

Tutorial

Translation

Original author: https://habr.com/ru/users/LesnoyChelovek/

The Growing Pains of Large Databases

Data is one of the most valuable assets of any company, but as businesses grow, so does their data footprint. Initially, a small, easy-to-manage database may suffice, but as the data scales up, companies typically face several pain points:

Storage costs increase as access frequency decreases
High-performance storage is expensive and limited
Older data sees less usage over time
Most data operations occur within a short window (weeks or months)
Historical data accumulates, consuming valuable high-speed storage

A one-size-fits-all storage approach is inefficient. Companies have to balance speed, cost, and data volume, while automating this process at scale — without manual intervention.

A Simple Analogy

Think of your database like a wardrobe. At first, you have plenty of space, but over time, old and rarely used items pile up. Expanding your closet is an option, but a smarter approach is moving seasonal clothes to storage — your garage, attic, or a country house. ILM does the same for your data.

What is Information Lifecycle Management (ILM)?

ILM is a data management strategy that categorizes information based on its business value and automates storage decisions according to company policies. It answers critical questions:

Which data needs instant access?
Which data can be moved to slower, cheaper storage?
Which data can be archived or deleted?

Example: ILM in an E-Commerce Store

An online store processes thousands of transactions daily:

Active orders & recent purchases are stored on fast NVMe drives for quick access
Older orders from years ago are moved to low-cost HDD storage for analytics
Expired data (e.g., outdated logs) are archived or deleted per company policies

By classifying data into four lifecycle stages, ILM ensures optimal performance and cost-efficiency:

Active Data → Frequently updated & randomly accessed (e.g., current transactions)
Less Active Data → Rarely modified, column-scanned (e.g., quarterly transaction history)
Historical Data → Never changes, rarely accessed (e.g., multi-year financial records)
Archived Data → Retained only for compliance

Each stage requires a different storage tier:

NVMe SSDs for high-speed, frequently accessed data
Standard SSDs for moderately active data
HDDs (compressed storage) for historical data
External storage for long-term archives

Note: The current pgpro_ilm extension does not yet support external storage like tape drives.

How ILM Works in Postgres Pro Enterprise 17

ILM automation in pgpro_ilm is based on three key components:

Object — the target data (a table or partition)
Condition — the trigger (e.g., data hasn't been accessed in 6 months)
Action — the operation (e.g., move data to a different storage tier)

With these components, you can set up rules tailored to your company's ILM strategy. For example:

First, less active data is moved to a tablespace on more affordable disks.

Then, once the data becomes purely historical, it gets transferred to a tablespace on high-capacity, cost-effective storage.

Note: Rules are processed in descending order of time period. This is intentional — starting with the "oldest" rule makes sense because if it applies, there's no need to check the others.

Processing Conditions

To use ILM effectively, you need data access statistics — without them, there's no way to tell which data is "new" and which is "old." However, Postgres Pro didn’t have such statistics, so we started exploring ways to collect the necessary data. Our solution had to meet a few key requirements:

It shouldn’t track system users (like postgres) or system processes (such as vacuum).
It should allow additional exclusions for specific users—useful if certain actions need to be ignored when generating statistics.
The statistics must be persistent. They shouldn’t disappear after a server restart, meaning they must be stored on disk, not just in memory.

While working on collecting DML operation statistics, we were also tackling the issue of unused privilege detection within the pgpro_usage extension. Eventually, we merged both efforts, integrating all required statistics collection into pgpro_usage. Now, everything is gathered together but remains separate — resetting statistics for unused privilege tracking doesn’t affect ILM-related statistics.

Note: Think of it like a car's odometer. It tracks both the total mileage and trip mileage separately. Resetting the trip counter doesn’t erase the overall mileage.

The pgpro_ilm extension retrieves last-access timestamps for tables, categorized by users, using functions and views from pgpro_usage. For ILM purposes, only specific actions matter, depending on the chosen rule:

NO_MODIFICATION: Tracks UPDATE, INSERT, DELETE, and TRUNCATE.
NO_ACCESS: Includes the above actions plus SELECT operations.

With this data, you can determine when, who, and how database objects were accessed.

From Theory to Practice

Now that we’ve covered how ILM works, let’s get hands-on with the pgpro_ilm extension in Postgres Pro Enterprise 17.

Installation

Before setting up pgpro_ilm, you need to install pgpro_usage. Add it to shared_preload_libraries in postgresql.conf and restart the database server:

CREATE EXTENSION pgpro_usage;
CREATE EXTENSION pgpro_ilm;

That’s it — ILM is ready to go.

Managing User Exclusions

You can exclude certain users from ILM tracking to avoid skewed statistics. This is useful for:

System users performing maintenance tasks.
Internal/external auditors who only access data occasionally.
Users running rare reports (e.g., a one-time historical data request).
Batch operations involving old data.

To return the list of excluded users:

SELECT user_name, exclude_access, exclude_modification FROM pgpro_ilm.get_exclude_users();

To set the list of users whose actions are ignored when checking NO_ACCESS and NO_MODIFICATION rules.

SELECT pgpro_ilm.set_exclude_users(array['exclude_access_user'], array['exclude_write_user']);

Defining Rules

You can set up ILM rules using the add_rule function:

add_rule(iv_object_name text, iv_rule_type text, iv_period interval, iv_action text, iv_parameter text) returns void.

Each rule has:

iv_object_name: Target table.
iv_rule_type: Rule type (NO_ACCESS or NO_MODIFICATION).
iv_period: Time before the rule triggers.
iv_action: Action to take (e.g., move data).
iv_parameter: Action parameter (e.g., target tablespace).

Here is an example:

SELECT pgpro_ilm.add_rule('sales_table_section_q1_2021', 'NO_ACCESS', interval '12 mons', 'ALTER_TS', 'low_cost_sales_tablespace');

Example policies:

Data untouched for 3+ months → Move to SSD.
Data not queried for 6+ months → Move to HDD.
Data unread for 12+ months → Compress.

To remove a rule:

remove_rule(iv_object_name text, iv_rule_type text, iv_period interval, iv_action text) returns void

Each rule has the same parameters as above excluding iv_parameter.

To list existing rules:

get_rules([iv_object_name text]) returns table

Each rule has:

iv_object_name — target table.

Processing Rules

Rules can be applied manually or automated via job scheduling (recommended). Postgres Pro provides two functions for rule processing:

process_rules(iv_object_name text) RETURNS void — applies rules to a specific table, where iv_object_name is the table name.
process_all_rules() RETURNS void — applies all rules to all tables.

For example, to process rules for a specific table section:

SELECT pgpro_ilm.process_rules('sales_table_section_q1_2021');

Note: If a table is partitioned, parent partitions are processed first, starting from the lowest level. Rules are applied in reverse order of their periods, and execution stops once a match is found.

Key Points About ILM

Rules can be set for an entire partitioned table or individual partitions. If applied to a partitioned table, they are automatically inherited by all partitions.
Partition-specific rules take priority over table-wide rules, but parent table rules still apply.

Running process_rules on a partitioned table does nothing since data is stored in partitions, not the table itself.
Currently, indexes move with the target table/partition. Future releases will allow separate rules for indexes.

If a table has already been moved, the rule won’t be applied again.
ALTER TABLE … SET TABLESPACE … uses AccessExclusiveLock, so calling process_rules again while a previous operation is still running is safe.
If ALTER TABLE fails, re-running process_rules will retry the operation.

Additional ILM Tools in Postgres Pro Enterprise 17

For even greater control over large datasets, Postgres Pro Enterprise 17 offers:

SPLIT PARTITION / MERGE PARTITIONS → Manage partitions dynamically
CFS (Compression Feature Set) → Reduce storage costs by compressing tables & indexes
pgpro_autopart → Automatic partitioning based on data volume
pgpro_bfile → Store unstructured data outside the database
pgpro_scheduler → Automate ILM tasks with scheduled jobs
Shardman → Create scalable, distributed PostgreSQL databases

Final Thoughts

With Postgres Pro Enterprise 17, managing data at scale has never been easier. The pgpro_ilm extension gives you full control over data lifecycle automation, ensuring:

Faster databases by keeping only the most relevant data in high-performance storage
Lower costs by offloading historical data to cheaper storage tiers
Simplified operations with fully automated ILM policies

And the best part: pgpro_ilm is built-in—no extra licensing, no complex setup. Just pure Postgres-powered efficiency

Hubs: