PostgreSQL supports hstore to store data such as KEY-> VALUE, which is also similar to ARRAY or JSON. To use this type of data efficiently, you must use efficient indexes. Let's take a look at the performance of two different types of indexes for the same retrieval request.
Suppose we have an original table with a BTREE index based on the str1 field.
t_girl=# \d status_check; Table "ytt.status_check" Column | Type | Modifiers --------+-----------------------+----------- is_yes | boolean | not null str1 | character varying(20) | not null str2 | character varying(20) | not nullIndexes: "index_status_check_str1" btree (str1)
There are 10 million records. The data is roughly as follows,
t_girl=# select * from status_check limit 2; is_yes | str1 | str2 --------+------+---------------------- f | 0 | cfcd208495d565ef66e7 t | 1 | c4ca4238a0b923820dcc(2 rows)Time: 0.617 mst_girl=#
Stores the status_check_hstore table structure of the hstore type. There is a GIST index based on the str1_str2 field.
Table "ytt.status_check_hstore" Column | Type | Modifiers -----------+---------+----------- is_yes | boolean | str1_str2 | hstore | Indexes: "idx_str_str2_gist" gist (str1_str2)
t_girl=# select * from status_check_hstore limit 2; is_yes | str1_str2 --------+----------------------------- f | "0"=>"cfcd208495d565ef66e7" t | "1"=>"c4ca4238a0b923820dcc"(2 rows)Time: 39.874 ms
Next we will get the same results as the original table query, of course, the original table query is very efficient. The table statements and results are as follows,
t_girl=# select * from status_check where str1 in ('10','23','33'); is_yes | str1 | str2 --------+------+---------------------- t | 10 | d3d9446802a44259755d t | 23 | 37693cfc748049e45d87 f | 33 | 182be0c5cdcd5072bb18(3 rows)Time: 0.690 ms
The preceding statement takes less than 1 ms.
Next we will query the hstore table,
t_girl=# select is_yes,skeys(str1_str2),svals(str1_str2) from status_check_hstore where str1_str2 ?| array['10','23','33']; is_yes | skeys | svals --------+-------+---------------------- t | 10 | d3d9446802a44259755d t | 23 | 37693cfc748049e45d87 f | 33 | 182be0c5cdcd5072bb18(3 rows)Time: 40.256 ms
My days are dozens of times slower than the query of the original table.
Check the query plan and scan all rows.
QUERY PLAN ----------------------------------------------------------------------------------- Bitmap Heap Scan on status_check_hstore (cost=5.06..790.12 rows=100000 width=38) Recheck Cond: (str1_str2 ?| '{10,23,33}'::text[]) -> Bitmap Index Scan on idx_str_str2_gist (cost=0.00..5.03 rows=100 width=0) Index Cond: (str1_str2 ?| '{10,23,33}'::text[])(4 rows)Time: 0.688 ms
We want to optimize this statement. If we convert this statement into the same as the original statement, can we use the BTREE index?
Next, create a function index based on B-tree,
t_girl=# create index idx_str1_str2_akeys on status_check_hstore using btree (array_to_string(akeys(str1_str2),','));CREATE INDEXTime: 394.123 ms
OK. Change the statement to execute the same search,
t_girl=# select is_yes,skeys(str1_str2),svals(str1_str2) from status_check_hstore where array_to_string(akeys(str1_str2),',') in ('10','23','33'); is_yes | skeys | svals --------+-------+---------------------- t | 10 | d3d9446802a44259755d t | 23 | 37693cfc748049e45d87 f | 33 | 182be0c5cdcd5072bb18(3 rows)Time: 0.727 ms
This is as fast as the original query.