PostgreSQL region settings

Last Update:2017-03-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

PostgreSQL region settings
PostgreSQL region settings

For Chinese users, we recommend that you set the encoding to UTF8 unconditionally in PostgreSQL. for simplified and unified regions (loacle), we recommend that you set the encoding to C, however, Collate and Ctype may affect the performance or functions.

Environment

Rhel 6.3 x64 VM (4C/8G/300g hdd)
PostgreSQL 9.6.2

Database

en_US=# \l+                                                                   List of databases   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   |  Size   | Tablespace |                Description                 -----------+----------+----------+------------+------------+-----------------------+---------+------------+-------------------------------------------- en_US     | postgres | UTF8     | en_US.UTF8 | en_US.UTF8 |                       | 7343 kB | pg_default |  postgres  | postgres | UTF8     | C          | C          |                       | 414 MB  | pg_default | default administrative connection database template0 | postgres | UTF8     | C          | C          | =c/postgres          +| 7225 kB | pg_default | unmodifiable empty database           |          |          |            |            | postgres=CTc/postgres |         |            |  template1 | postgres | UTF8     | C          | C          | =c/postgres          +| 7225 kB | pg_default | default template for new databases           |          |          |            |            | postgres=CTc/postgres |         |            |  zh_CN     | postgres | UTF8     | zh_CN.UTF8 | zh_CN.UTF8 |                       | 7225 kB | pg_default | (5 rows)

Impact of Collate on Functions

Collate will affect the sorting of Chinese characters. In the zh_CN region, Chinese characters are sorted by pinyin, and other regions are sorted by character encoding.

Postgres = # select * from (values ('King'), ('marw'), ('west'), ('yang') a order by; column1 --------- Yang wangxi mink (4 rows) postgres = # \ c en_USYou are now connected to database "en_US" as user "postgres ". en_US = # select * from (values ('King'), ('marw'), ('west'), ('yang') a order by; column1 --------- Yang wangxi mink (4 rows) en_US = # \ c zh_CNYou are now connected to database "zh_CN" as user "postgres ". zh_CN = # select * from (values ('King'), ('marw'), ('west'), ('yang') a order by; column1 --------- mink Wang Xiyang (4 rows)

Test Method for the impact of Collate on Performance

postgres=# create table tb1(c1 text);CREATE TABLETime: 5.653 mspostgres=# insert into tb1 select md5(generate_series(1,1000000)::text);INSERT 0 1000000Time: 2671.929 mspostgres=# vacuum ANALYZE tb1;VACUUMTime: 398.817 mspostgres=# select * from tb1 order by c1 limit 1;                c1                ---------------------------------- 0000104cd168386a335ba6bf6e32219d(1 row)Time: 176.779 mspostgres=# create index idx1 on tb1(c1);CREATE INDEXTime: 1549.436 ms

Test Results

Collate/Ctype       C       en_US.UTF8  zh_CN.UTF8insert              2671    2613        2670vacuum ANALYZE      398     250         396order by            176     388         401create index        1549    7492        7904insert(with index)  11199   15621       16128

Impact of Ctype

Ctype affects the results of pg_trgm and some regular expression matching. For example, if the Ctype is 'C', pg_trgm cannot support Chinese characters.

Postgres = # select show_trgm ('aaabbbc to x'); show_trgm then {"a", "x", "aa", "x", aaa, aab, abb, bbb, bbc, "bc"} (1 row) en_US = # select show_trgm ('aaabbbc to x'); show_trgm done {"a", "aa", 0x27bdf1, 0x30bd19, 0x4624bc, aaa, aab, abb, bbb, bbc, 0x6a2ad5} (1 row) zh_CN = # select show_trgm ('aaabbbc to x '); show_trgm resume {"a", "aa", 0x27bdf1, 0x30bd19, 0x4624bc, aaa, aab, abb, bbb, bbc, 0x6a2ad5} (1 row)

Conclusion

For scenarios with low performance requirements, we recommend that you set both Collate and Ctype to zh_CN.UTF8 and other regions to C.
```
initdb -E UTF8 --locale=C --lc-collate=zh_CN.UTF8 --lc-ctype=zh_CN.UTF8 ... 
```
For scenarios with high performance requirements, we recommend that you set Ctype to zh_CN.UTF8 and other regions to C. If some queries need to be sorted by pinyin, you can specify Collate as zh_CN in the column definition and SQL expression.
```
initdb -E UTF8 --locale=C --lc-ctype=zh_CN.UTF8 ... 
```

Reference

Regions and codes in PostgreSQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

PostgreSQL region settings

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

PostgreSQL region settings

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support