Tips for improving PostgreSQL Performance

Last Update:2015-04-23 Source: Internet

Author: User

Tags datadog

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Tips for improving PostgreSQL Performance

In a (poor) PostgreSQL query, only a small change is required (ANY (ARRAY [...]). to ANY (VALUES (...))) the query time can be reduced from 20 s to 0.2 s. From the simplest use of explain analyze, to the use of EXPLAIN s community, there will be a hundred times of learning time investment to return.

Use ipvs to monitor slow ipvs queries

Earlier this week, a primary key query for small tables (10 GB, 15 million rows) in our graphic editor was performed on one of our (multiple) the query performance problems inherent in the database.

99.9% to the query is very fast and smooth, but in some places where a large number of enumeration values are used, these queries will take 20 seconds. Spending so much time on the database means that the user must wait in front of the browser for the graphic editor to respond. It is obvious that only the 0.01% will have a very bad impact.

Query and query plans

The following is a query of the problem.

SELECT c.key,    c.x_key,    c.tags,    x.name FROM context c JOIN x  ON c.x_key = x.keyWHERE c.key = ANY (ARRAY[15368196, -- 11,000 other keys --)]) AND c.x_key = 1 AND c.tags @> ARRAY[E'blah'];

Table X contains thousands of rows of data, and Table C contains 15 million rows of data. The primary key value "key" of both tables has an appropriate index. This is a simple and clear primary key query. However, it is interesting to note that when the number of primary key content is increased, for example, if the primary key has 11,000 values, you can add the EXPLAIN (ANALYZE, BUFFERS) to the query statement) we get the following query plan.

Nested Loop (cost=6923.33..11770.59 rows=1 width=362) (actual time=17128.188..22109.283 rows=10858 loops=1) Buffers: shared hit=83494 -> Bitmap Heap Scan on context c (cost=6923.33..11762.31 rows=1 width=329) (actual time=17128.121..22031.783 rows=10858 loops=1)    Recheck Cond: ((tags @> '{blah}'::text[]) AND (x_key = 1))    Filter: (key = ANY ('{15368196,(a lot more keys here)}'::integer[]))    Buffers: shared hit=50919    -> BitmapAnd (cost=6923.33..6923.33 rows=269 width=0) (actual time=132.910..132.910 rows=0 loops=1)       Buffers: shared hit=1342       -> Bitmap Index Scan on context_tags_idx (cost=0.00..1149.61 rows=15891 width=0) (actual time=64.614..64.614 rows=264777 loops=1)          Index Cond: (tags @> '{blah}'::text[])          Buffers: shared hit=401       -> Bitmap Index Scan on context_x_id_source_type_id_idx (cost=0.00..5773.47 rows=268667 width=0) (actual time=54.648..54.648 rows=267659 loops=1)          Index Cond: (x_id = 1)          Buffers: shared hit=941 -> Index Scan using x_pkey on x (cost=0.00..8.27 rows=1 width=37) (actual time=0.003..0.004 rows=1 loops=10858)    Index Cond: (x.key = 1)    Buffers: shared hit=32575Total runtime: 22117.417 ms

At the bottom of the result, you can see that this query takes 22 seconds in total. We can intuitively observe the cost of these 22 seconds through the CPU usage diagram below. Most of the time is spent on ipvs and OS, and only a few are used for I/O.

At the lowest level, these queries look like the peak CPU usage. The CPU diagram is rarely useful, but it confirms the key point: the database does not wait for the disk to read data. It is doing sorting, hashing, row comparison, and so on.

The second interesting measurement is the trajectory that is very close to these peaks. They are the number of rows "obtained" by S. (If no result is returned in this example, ignore it ).

Obviously, some actions have viewed many rows in an orderly manner: our queries.

Postgres problem: bitmap Scanning

The following is a query plan for row matching.

Buffers: shared hit=83494 -> Bitmap Heap Scan on context c (cost=6923.33..11762.31 rows=1 width=329) (actual time=17128.121..22031.783 rows=10858 loops=1)    Recheck Cond: ((tags @> '{blah}'::text[]) AND (x_key = 1))    Filter: (key = ANY ('{15368196,(a lot more keys here)}'::integer[]))    Buffers: shared hit=50919

Postgres uses bitmap to scan the table C. When the data size of the primary key is small, it can effectively use indexes to create bitmaps in the memory. If the bitmap is too large, the optimal query plan changes the query method. In our query, because the primary key contains a large amount of data, the query uses the optimal (determined by the System) method to retrieve and query candidate rows, and immediately query all data that matches the primary key. That is, it takes too much time (Recheck Cond in the query plan) to store data in the memory ).

Fortunately, only 30% of the data is imported into the memory, so it is not as bad as reading from the hard disk. However, it still has a significant impact on performance. Remember, the query is very simple. This is a primary key query, so there is no clear way to determine whether it has dramatic re-architecture of the database or application. PGSQL-Performance mailing list provides us with great help.

Solution

This is another reason we like open source and help users. Tom Lane is one of the most popular programmers in open source code. He suggested the following:

SELECT c.key,    c.x_key,    c.tags,    x.name FROM context c JOIN x  ON c.x_key = x.keyWHERE c.key = ANY (VALUES (15368196), -- 11,000 other keys --) AND c.x_key = 1 AND c.tags @> ARRAY[E'blah'];

Change ARRAY to VALUES. Can you point out their differences?

We use ARRAY [...] to list all the keywords for query, but this spoofs the query optimizer. However, Values (...) allows the optimizer to make full use of the keyword index. Only a line of code changes without any semantic changes.

The following describes how to write a new query statement. The difference lies in rows 3 and 14th.

Nested Loop (cost=168.22..2116.29 rows=148 width=362) (actual time=22.134..256.531 rows=10858 loops=1) Buffers: shared hit=44967 -> Index Scan using x_pkey on x (cost=0.00..8.27 rows=1 width=37) (actual time=0.071..0.073 rows=1 loops=1)    Index Cond: (id = 1)    Buffers: shared hit=4 -> Nested Loop (cost=168.22..2106.54 rows=148 width=329) (actual time=22.060..242.406 rows=10858 loops=1)    Buffers: shared hit=44963    -> HashAggregate (cost=168.22..170.22 rows=200 width=4) (actual time=21.529..32.820 rows=11215 loops=1)       -> Values Scan on "*VALUES*" (cost=0.00..140.19 rows=11215 width=4) (actual time=0.005..9.527 rows=11215 loops=1)    -> Index Scan using context_pkey on context c (cost=0.00..9.67 rows=1 width=329) (actual time=0.015..0.016 rows=1 loops=11215)       Index Cond: (c.key = "*VALUES*".column1)       Filter: ((c.tags @> '{blah}'::text[]) AND (c.x_id = 1))       Buffers: shared hit=44963Total runtime: 263.639 ms

The query time is reduced from 200 ms to 100 ms, and the efficiency of changing only one line of code is increased by times.

New query used in production

Code to be released:
It makes the database look more beautiful and easy.

Third-party tools

Postgres slow query does not exist. But who would like to be tortured by the 0.1% unfortunate minority. To immediately verify the impact of the modification query, Datadog is required to help us determine whether the modification is correct.

If you want to find out the impact on Postgres query changes, it may take several minutes to register a free Datadog account.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More