PostgreSQL supports hstore to store data such as Key->value, which is in fact similar to array or JSON type. Efficient indexing is essential to the efficient use of this type of data. Let's take a look at the performance issues of two different types of indexes for the same retrieval request today.
If we had such an original table. There is a Btree index based on the Str1 field.
t_girl = # \ d status_check;
Table "ytt.status_check"
Column | Type | Modifiers
-------- + ----------------------- + -----------
is_yes | boolean | not null
str1 | character varying (20) | not null
str2 | character varying (20) | not null
Indexes:
"index_status_check_str1" btree (str1)
There are 10W records. The data is probably as follows.
t_girl = # select * from status_check limit 2;
is_yes | str1 | str2
-------- + ------ + ----------------------
f | 0 | cfcd208495d565ef66e7
t | 1 | c4ca4238a0b923820dcc
(2 rows)
Time: 0.617 ms
t_girl = #
Store the status_check_hstore table structure of hstore type, based on the str1_str2 field has a GIST index.
Table "ytt.status_check_hstore"
Column | Type | Modifiers
----------- + --------- + -----------
is_yes | boolean |
str1_str2 | hstore |
Indexes:
"idx_str_str2_gist" gist (str1_str2)
t_girl = # select * from status_check_hstore limit 2;
is_yes | str1_str2
-------- + -----------------------------
f | "0" => "cfcd208495d565ef66e7"
t | "1" => "c4ca4238a0b923820dcc"
(2 rows)
Time: 39.874 ms
Next we have to get the same result as querying the original table, of course the query of the original table is very efficient. Table statements and results are as follows,
t_girl = # select * from status_check where str1 in (‘10‘, ‘23‘, ‘33‘);
is_yes | str1 | str2
-------- + ------ + ----------------------
t | 10 | d3d9446802a44259755d
t | 23 | 37693cfc748049e45d87
f | 33 | 182be0c5cdcd5072bb18
(3 rows)
Time: 0.690 ms
The above statement took less than 1 millisecond.
Next we query the hstore table.
t_girl = # select is_yes, skeys (str1_str2), svals (str1_str2) from status_check_hstore where str1_str2? | array [‘10 ‘,‘ 23 ‘,‘ 33 ‘];
is_yes | skeys | svals
-------- + ------- + ----------------------
t | 10 | d3d9446802a44259755d
t | 23 | 37693cfc748049e45d87
f | 33 | 182be0c5cdcd5072bb18
(3 rows)
Time: 40.256 ms
Oh my God. Queries that are dozens of times slower than the original table.
Look at the query plan and scan all the rows.
QUERY PLAN
-------------------------------------------------- ---------------------------------
Bitmap Heap Scan on status_check_hstore (cost = 5.06..790.12 rows = 100000 width = 38)
Recheck Cond: (str1_str2?
| ‘{10,23,33}’ :: text [])
-> Bitmap Index Scan on idx_str_str2_gist (cost = 0.00..5.03 rows = 100 width = 0)
Index Cond: (str1_str2? | ‘{10,23,33}‘ :: text [])
(4 rows)
Time: 0.688 ms
We think of ways to optimize this statement, assuming that this statement becomes the same as the original statement. Is it possible to use the BTREE index?
Next, create a function index based on BTREE,
t_girl = # create index idx_str1_str2_akeys on status_check_hstore using btree (array_to_string (akeys (str1_str2), ‘,‘));
CREATE INDEX
Time: 394.123 ms
OK, change the statement to run the same search,
t_girl = # select is_yes, skeys (str1_str2), svals (str1_str2) from status_check_hstore where array_to_string (akeys (str1_str2), ‘,‘) in (‘10‘, ‘23‘, ‘33’);
is_yes | skeys | svals
-------- + ------- + ----------------------
t | 10 | d3d9446802a44259755d
t | 23 | 37693cfc748049e45d87
f | 33 | 182be0c5cdcd5072bb18
(3 rows)
Time: 0.727 ms
This time as fast as the original query.
An example of PostgreSQL hstore column performance improvement