PostgreSQL的地區設定
PostgreSQL的地區設定
對於中文使用者,在PostgreSQL中應該將編碼無條件的設為UTF8,為簡化和統一地區(loacle)也推薦盡量設定為C,但Collate和Ctype對效能或功能有一定影響,需要注意。
環境
- rhel 6.3 x64虛機(4C/8G/300G HDD)
- PostgreSQL 9.6.2
資料庫
en_US=# \l+ List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description -----------+----------+----------+------------+------------+-----------------------+---------+------------+-------------------------------------------- en_US | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | | 7343 kB | pg_default | postgres | postgres | UTF8 | C | C | | 414 MB | pg_default | default administrative connection database template0 | postgres | UTF8 | C | C | =c/postgres +| 7225 kB | pg_default | unmodifiable empty database | | | | | postgres=CTc/postgres | | | template1 | postgres | UTF8 | C | C | =c/postgres +| 7225 kB | pg_default | default template for new databases | | | | | postgres=CTc/postgres | | | zh_CN | postgres | UTF8 | zh_CN.UTF8 | zh_CN.UTF8 | | 7225 kB | pg_default | (5 rows)
Collate對功能的影響
Collate會影響中文的排序,在zh_CN的地區下中文按拼音排序,其它地區按字元編碼排序。
postgres=# select * from (values ('王'),('貂'),('西'),('楊')) a order by a; column1 --------- 楊 王 西 貂(4 rows)postgres=# \c en_USYou are now connected to database "en_US" as user "postgres".en_US=# select * from (values ('王'),('貂'),('西'),('楊')) a order by a; column1 --------- 楊 王 西 貂(4 rows)en_US=# \c zh_CNYou are now connected to database "zh_CN" as user "postgres".zh_CN=# select * from (values ('王'),('貂'),('西'),('楊')) a order by a; column1 --------- 貂 王 西 楊(4 rows)
Collate對效能的影響測試方法
postgres=# create table tb1(c1 text);CREATE TABLETime: 5.653 mspostgres=# insert into tb1 select md5(generate_series(1,1000000)::text);INSERT 0 1000000Time: 2671.929 mspostgres=# vacuum ANALYZE tb1;VACUUMTime: 398.817 mspostgres=# select * from tb1 order by c1 limit 1; c1 ---------------------------------- 0000104cd168386a335ba6bf6e32219d(1 row)Time: 176.779 mspostgres=# create index idx1 on tb1(c1);CREATE INDEXTime: 1549.436 ms
測試結果
Collate/Ctype C en_US.UTF8 zh_CN.UTF8insert 2671 2613 2670vacuum ANALYZE 398 250 396order by 176 388 401create index 1549 7492 7904insert(with index) 11199 15621 16128
Ctype的影響
Ctype會影響pg_trgm和部分正則匹配的結果,比如Ctype為'C'時,pg_trgm將無法支援中文
postgres=# select show_trgm('aaabbbc到的x'); show_trgm ----------------------------------------------------- {" a"," x"," aa"," x ",aaa,aab,abb,bbb,bbc,"bc "}(1 row)en_US=# select show_trgm('aaabbbc到的x'); show_trgm ----------------------------------------------------------------------- {" a"," aa",0x27bdf1,0x30bd19,0x4624bc,aaa,aab,abb,bbb,bbc,0x6a2ad5}(1 row)zh_CN=# select show_trgm('aaabbbc到的x'); show_trgm ----------------------------------------------------------------------- {" a"," aa",0x27bdf1,0x30bd19,0x4624bc,aaa,aab,abb,bbb,bbc,0x6a2ad5}(1 row)
結論
對效能要求不高的情境建議將Collate和Ctype都設定為zh_CN.UTF8,其它地區設定為C。
initdb -E UTF8 --locale=C --lc-collate=zh_CN.UTF8 --lc-ctype=zh_CN.UTF8 ...
對效能要求較高的情境建議將Ctype設定為zh_CN.UTF8,其它地區設定為C。如果有部分查詢需要按拼音排序,可在列定義和SQL運算式中指定Collate為zh_CN。
initdb -E UTF8 --locale=C --lc-ctype=zh_CN.UTF8 ...
參考