PostgreSQL region settings

Source: Internet
Author: User

PostgreSQL region settings
PostgreSQL region settings

For Chinese users, we recommend that you set the encoding to UTF8 unconditionally in PostgreSQL. for simplified and unified regions (loacle), we recommend that you set the encoding to C, however, Collate and Ctype may affect the performance or functions.

Environment
  • Rhel 6.3 x64 VM (4C/8G/300g hdd)
  • PostgreSQL 9.6.2
Database
en_US=# \l+                                                                   List of databases   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   |  Size   | Tablespace |                Description                 -----------+----------+----------+------------+------------+-----------------------+---------+------------+-------------------------------------------- en_US     | postgres | UTF8     | en_US.UTF8 | en_US.UTF8 |                       | 7343 kB | pg_default |  postgres  | postgres | UTF8     | C          | C          |                       | 414 MB  | pg_default | default administrative connection database template0 | postgres | UTF8     | C          | C          | =c/postgres          +| 7225 kB | pg_default | unmodifiable empty database           |          |          |            |            | postgres=CTc/postgres |         |            |  template1 | postgres | UTF8     | C          | C          | =c/postgres          +| 7225 kB | pg_default | default template for new databases           |          |          |            |            | postgres=CTc/postgres |         |            |  zh_CN     | postgres | UTF8     | zh_CN.UTF8 | zh_CN.UTF8 |                       | 7225 kB | pg_default | (5 rows) 
Impact of Collate on Functions

Collate will affect the sorting of Chinese characters. In the zh_CN region, Chinese characters are sorted by pinyin, and other regions are sorted by character encoding.

Postgres = # select * from (values ('King'), ('marw'), ('west'), ('yang') a order by; column1 --------- Yang wangxi mink (4 rows) postgres = # \ c en_USYou are now connected to database "en_US" as user "postgres ". en_US = # select * from (values ('King'), ('marw'), ('west'), ('yang') a order by; column1 --------- Yang wangxi mink (4 rows) en_US = # \ c zh_CNYou are now connected to database "zh_CN" as user "postgres ". zh_CN = # select * from (values ('King'), ('marw'), ('west'), ('yang') a order by; column1 --------- mink Wang Xiyang (4 rows)
Test Method for the impact of Collate on Performance
postgres=# create table tb1(c1 text);CREATE TABLETime: 5.653 mspostgres=# insert into tb1 select md5(generate_series(1,1000000)::text);INSERT 0 1000000Time: 2671.929 mspostgres=# vacuum ANALYZE tb1;VACUUMTime: 398.817 mspostgres=# select * from tb1 order by c1 limit 1;                c1                ---------------------------------- 0000104cd168386a335ba6bf6e32219d(1 row)Time: 176.779 mspostgres=# create index idx1 on tb1(c1);CREATE INDEXTime: 1549.436 ms 
Test Results
Collate/Ctype       C       en_US.UTF8  zh_CN.UTF8insert              2671    2613        2670vacuum ANALYZE      398     250         396order by            176     388         401create index        1549    7492        7904insert(with index)  11199   15621       16128 
Impact of Ctype

Ctype affects the results of pg_trgm and some regular expression matching. For example, if the Ctype is 'C', pg_trgm cannot support Chinese characters.

Postgres = # select show_trgm ('aaabbbc to x'); show_trgm then {"a", "x", "aa", "x", aaa, aab, abb, bbb, bbc, "bc"} (1 row) en_US = # select show_trgm ('aaabbbc to x'); show_trgm done {"a", "aa", 0x27bdf1, 0x30bd19, 0x4624bc, aaa, aab, abb, bbb, bbc, 0x6a2ad5} (1 row) zh_CN = # select show_trgm ('aaabbbc to x '); show_trgm resume {"a", "aa", 0x27bdf1, 0x30bd19, 0x4624bc, aaa, aab, abb, bbb, bbc, 0x6a2ad5} (1 row)
Conclusion
  1. For scenarios with low performance requirements, we recommend that you set both Collate and Ctype to zh_CN.UTF8 and other regions to C.

    initdb -E UTF8 --locale=C --lc-collate=zh_CN.UTF8 --lc-ctype=zh_CN.UTF8 ... 
  2. For scenarios with high performance requirements, we recommend that you set Ctype to zh_CN.UTF8 and other regions to C. If some queries need to be sorted by pinyin, you can specify Collate as zh_CN in the column definition and SQL expression.

    initdb -E UTF8 --locale=C --lc-ctype=zh_CN.UTF8 ... 
Reference
  • Regions and codes in PostgreSQL

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.