We have already discussed PostgreSQL indexing engine
, the interface of access methods, and main access methods, such as: hash indexes
, and GIN
. In this article, we will watch how gin turns into rum.
Although the authors claim that gin is a powerful genie, the theme of drinks has eventually won: next-generation GIN has been called RUM.
This access method expands the concept that underlies GIN and enables us to perform full-text search even faster. In this series of articles, this is the only method that is not included in a standard PostgreSQL delivery and is an external extension. Several installation options are available for it:
- Take «yum» or «apt» package from the PGDG repository. For example, if you installed PostgreSQL from «postgresql-10» package, also install «postgresql-10-rum».
- Build from source code on github and install on your own (the instruction is there as well).
- Use as a part of Postgres Pro Enterprise (or at least read the documentation from there).
Limitations of GIN
What limitations of GIN does RUM enable us to transcend?
First, «tsvector» data type contains not only lexemes, but also information on their positions inside the document. As we observed last time
, GIN index does not store this information. For this reason, operations to search for phrases, which appeared in version 9.6, are supported by GIN index inefficiently and have to access the original data for recheck.
Second, search systems usually return the results sorted by relevance (whatever that means). We can use ranking functions «ts_rank» and «ts_rank_cd» to this end, but they have to be computed for each row of the result, which is certainly slow.
To a first approximation, RUM access method can be considered as GIN that additionally stores position information and can return the results in a needed order (like GiST
can return nearest neighbors). Let's move step by step.