The first article described
PostgreSQL indexing engine, the second one dealt with
the interface of access methods, and now we are ready to discuss specific types of indexes. Let's start with hash index.
Hash
Structure
General theory
Plenty of modern programming languages include hash tables as the base data type. On the outside, a hash table looks like a regular array that is indexed with any data type (for example, string) rather than with an integer number. Hash index in PostgreSQL is structured in a similar way. How does this work?
As a rule, data types have very large ranges of permissible values: how many different strings can we potentially envisage in a column of type «text»? At the same time, how many different values are actually stored in a text column of some table? Usually, not so many of them.
The idea of hashing is to associate a small number (from 0 to
N−1,
N values in total) with a value of any data type. Association like this is called
a hash function. The number obtained can be used as an index of a regular array where references to table rows (TIDs) will be stored. Elements of this array are called
hash table buckets — one bucket can store several TIDs if the same indexed value appears in different rows.
The more uniformly a hash function distributes source values by buckets, the better it is. But even a good hash function will sometimes produce equal results for different source values — this is called
a collision. So, one bucket can store TIDs corresponding to different keys, and therefore, TIDs obtained from the index need to be rechecked.