標籤:
剛熟悉PG的開發人員接觸PG時,或多或少會碰到一些問題.
常見的表現是碰到一些慢SQL.這時候別急著吐槽.絕大多數並不是問題,稍微的文法調整或進行一些簡單的最佳化就能解決問題.下面具體分析幾個案例.
一: 中文字串的索引掃描慢
test =# \d testidx
Table"test.testidx"
Column | Type | Modifiers
----------------+-----------------------------+-----------
id | numeric |
table_id | numeric |
description | character varying(4000) |
user_comment | character varying(4000) |
encoding | character varying(64) |
這是一個很常見的表,它所在database的 Encoding Collate Ctype 都是zh_CN.UTF-8
為了檢索description的資訊,很常見的(www.neitui.me),我們在該列上建立一個btree索引.
test=# create index idx_testidx on testidx(description);
CREATE INDEX
檢索列中的資訊使用like語句進行匹配,發現查詢計劃居然沒有使用索引.
test=# explain select description from testidxwhere description like ‘test%‘;
QUERY PLAN
-------------------------------------------------------------
SeqScan on testidx (cost=0.00..30151.00rows=64 width=28)
Filter: ((description)::text ~~ ‘test%‘::text)
(2 rows)
有點不甘心,禁掉索引看個究竟,索引是用上了,但是條件過濾放到了索引外.資料用索引掃描擷取,產生了位元影像,然後走Bitmap Heap Scan,很明顯這是有問題的.
test=# set enable_seqscan=off;
SET
test=# explain select description from testidxwhere description like ‘test%‘;
QUERYPLAN
------------------------------------------------------------------------------------
Bitmap Heap Scan on testidx (cost=29756.57..59907.57 rows=64 width=28)
Filter: ((description)::text ~~ ‘test%‘::text)
-> Bitmap Index Scan on idx_testidx (cost=0.00..29756.55 rows=1000000 width=0)
(3 rows)
實際的執行一遍,看看執行情況,也不樂觀,1000000行資料都取了出來,堆掃描的過濾器過濾掉所有的行
執行該SQL用了接近半秒,太慢了.
test=# explain analyze select description fromtestidx where description like ‘test%‘;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on testidx (cost=29756.57..59907.57 rows=64 width=28)(actual time=407.548..407.548 rows=0 loops=1)
Filter: ((description)::text ~~ ‘test%‘::text)
Rows Removed by Filter: 1000000
-> Bitmap Index Scan onidx_testidx (cost=0.00..29756.55rows=1000000 width=0) (actual time=166.581..166.581 rows=1000000 loops=1)
Total runtime: 407.590 ms
(5 rows)
原因很簡單
1 在utf8編碼下表中的列需要按照utf8的規則來操作(各種操作符 > =< (~~)like 等)
2 建立索引時沒有指定比較方式,預設採用standard"C"的字串比較方法在UTF8上只支援 = 操作符.
3 建立索引時指定特定的比較方式能支援索引的like.
4 btree gin hash索引的用法相同.
5 pg中char varchar text資料類型適用上述規則.
The operator classes text_pattern_ops,varchar_pattern_ops, and bpchar_pattern_ops support B-tree indexes on the typestext, varchar, and char respectively. The difference from the default operatorclasses is that the values are compared strictly character by character ratherthan according to the locale-specific collation rules. This makes theseoperator classes suitable for use by queries involving pattern matchingexpressions (LIKE or POSIX regular expressions) when the database does not usethe standard "C" locale. As an example, you might index a varcharcolumn like this:
來看看create index的文法:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [name ] ON table_name [ USING method ]
({ column_name | ( expression ) } [ ] [opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[WITH ( storage_parameter = value [, ... ] ) ]
[TABLESPACE tablespace_name ]
[WHERE predicate ]
collation
The name of the collation to use for theindex. By default, the index uses the collation declared for the column to beindexed or the result collation of the expression to be indexed. Indexes withnon-default collations can be useful for queries that involve expressions usingnon-default collations.
於是,可以用該文法建立對應可用的索引.
test=# create index idx_testidx2 on testidx(description varchar_pattern_ops);
CREATE INDEX
讓我們來看看效果,Index Cond出現了.使用索引掃描,並且做了條件過濾,可能有64行有效資料.
test=# explain select description from testidxwhere description like ‘test%‘;
QUERYPLAN
------------------------------------------------------------------------------------
Index Only Scan using idx_testidx2 ontestidx (cost=0.55..8.57 rows=64width=28)
Index Cond: ((description ~>=~ ‘test‘::text) AND (description ~<~‘tesu‘::text))
Filter: ((description)::text ~~ ‘test%‘::text)
(3 rows)
實際執行結果驗證了索引的有效性,SQL效率執行效果明顯.
test=# explain analyze select description fromtestidx where description like ‘test%‘;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Index Only Scan using idx_testidx2 ontestidx (cost=0.55..8.57 rows=64width=28) (actual time=0.081..0.081 rows=0 loops=1)
Index Cond: ((description ~>=~ ‘test‘::text) AND (description ~<~‘tesu‘::text))
Filter: ((description)::text ~~ ‘test%‘::text)
Heap Fetches: 0
Total runtime: 0.105 ms
(5 rows)
PostgreSQL 常見慢SQL問題解決方案