時序商務（所有VALUE NEW值7秒到7毫秒的優化之旅）- 阿裡雲RDS PostgreSQL最佳實踐

最後更新：2018-05-24 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

摘要： 標籤PostgreSQL , 時序資料庫, 時序, 滑動視窗, 遞迴查詢, subquery , 視窗查詢, 求最新值幕後在很多場景中，都會有資料合併、清洗的需求。例如：1、記錄了表的變更明細(insert,update,delete)，需要合併明細，從明細中快速取到每個PK的最新值。

幕後

在很多場景中，都會有資料合併、清洗的需求。

例如：

1、記錄了表的變更明細(insert,update,delete)，需要合併明細，從明細中快速取到每個PK的最新值。

2、有很多感應器，不斷的在上報資料，要快速的取出每個感應器的最新狀態。

對於這類需求，可以使用視窗查詢，但是如何加速，如何快速的取出批量資料呢？

PostgreSQL是最進階的開來源資料庫，優化方法之多，超乎你的想像。

時序資料取值優化

1、唯一值較少時，並且唯一值範圍未知時，使用遞迴。

方法如下：

《PostgrSQL遞迴SQL的幾個套用 - 極客與正常人的思維》

《時序資料合併場景加速剖析和實現 - 複合索引，視窗分組查詢加速，變態遞迴加速》

2、唯一值較少時，並且唯一值範圍確定時，使用subquery。

方法如下：

《PostgreSQL海量時序資料(任意滑動視窗即時統計剖析) - 感應器、人群、物體等物件追蹤》

3、唯一值較多時，1 使用視窗查詢，比前面的方法更加適合。

4、唯一值較多時，2 使用流式計算，比方法3更加優秀。

方法如下：

《(流式、lambda、觸發程序)即時處理大比拼 - 物聯網(IoT)金融,時序處理最佳實踐》

本文將對前三種方法做一個比較。

方法4 流計算就不用比了，因為它什麼時候都是最強大的。通殺所有場景。等pipelineDB 外掛程式化吧，阿裡雲RDS PG 10會整合pipelineDB的功能。

遞迴 vs subquery vs 視窗

以500萬資料為例，對比這幾種方法的調整場景。

1 遞迴

場景一、唯一值較多(100萬唯一值)

1、建表

	iming

drop table test;

create unlogged table test(id int , info text, crt_time timestamp);

2、建構資料

insert into test select ceil(random()*1000000), md5(random()::text), clock_timestamp() from generate_series(1,5000000);

3、建立索引

create index idx_test_1 on test (id, crt_time desc);

4、遞迴查詢效率

explain (analyze,verbose,timing,costs,buffers) with recursive skip as (

select test as v from test where id in (select id from test where id is not null order by id,crt_time desc limit 1) limit 1

union all

select (

select t as v from test t where t.id>(s.v).id and t.id is not null order by id,crt_time desc limit 1

) from skip s where (s.v).id is not null

)--這裡的where (s.v).id is not null一定要加,否則就閉環了.

select (t.v).id, (t.v).info, (t.v).crt_time from skip t where t.* is not null;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

CTE Scan on skip t(cost=54.35..56.37 rows=100 width=44) (actual time=0.042..6626.084 rows=993288 loops=1)

Output: (t.v).id, (t.v).info, (t.v).crt_time

Filter: (t.* IS NOT NULL)

Rows Removed by Filter: 1

Buffers: shared hit=3976934

CTE skip

->Recursive Union(cost=0.91..54.35 rows=101 width=69) (actual time=0.034..6006.615 rows=993289 loops=1)

Buffers: shared hit=3976934

->Limit(cost=0.91..0.93 rows=1 width=69) (actual time=0.033..0.033 rows=1 loops=1)

Output: test.*

Buffers: shared hit=8

->Nested Loop(cost=0.91..10.19 rows=500 width=69) (actual time=0.032..0.032 rows=1 loops=1)

Output: test.*

Buffers: shared hit=8

->HashAggregate(cost=0.48..0.49 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=1)

Output: test_1.id

Group Key: test_1.id

Buffers: shared hit=4

->Limit(cost=0.43..0.47 rows=1 width=12) (actual time=0.016..0.016 rows=1 loops=1)

Output: test_1.id, test_1.crt_time

Buffers: shared hit=4

->Index Only Scan using idx_test_1 on public.test test_1(cost=0.43..173279.36 rows=5000002 width=12) (actual time=0.015..0.015 rows=1 loops=1)

Output: test_1.id, test_1.crt_time

Index Cond: (test_1.id IS NOT NULL)

Heap Fetches: 1

Buffers: shared hit=4

->Index Scan using idx_test_1 on public.test(cost=0.43..9.64 rows=6 width=73) (actual time=0.009..0.009 rows=1 loops=1)

Output: test.*, test.id

Index Cond: (test.id = test_1.id)

Buffers: shared hit=4

->WorkTable Scan on skip s(cost=0.00..5.14 rows=10 width=32) (actual time=0.006..0.006 rows=1 loops=993289)

Output: (SubPlan 1)

Filter: ((s.v).id IS NOT NULL)

Rows Removed by Filter: 0

Buffers: shared hit=3976926

SubPlan 1

->Limit(cost=0.43..0.49 rows=1 width=81) (actual time=0.005..0.005 rows=1 loops=993288)

Output: t_1.*, t_1.id, t_1.crt_time

Buffers: shared hit=3976926

->Index Scan using idx_test_1 on public.test t_1(cost=0.43..102425.17 rows=1666667 width=81) (actual time=0.005..0.005 rows=1 loops=993288)

Output: t_1.*, t_1.id, t_1.crt_time

Index Cond: ((t_1.id > (s.v).id) AND (t_1.id IS NOT NULL))

Buffers: shared hit=3976926

Planning time: 0.354 ms

Execution time: 6706.105 ms

(45 rows)

場景二、唯一值較少(1000唯一值)

1、建表

	iming

drop table test;

create unlogged table test(id int , info text, crt_time timestamp);

2、建構資料

insert into test select ceil(random()*1000), md5(random()::text), clock_timestamp() from generate_series(1,5000000);

3、建立索引

create index idx_test_1 on test (id, crt_time desc);

4、遞迴查詢效率

查詢語句不變

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

CTE Scan on skip t(cost=55.09..57.11 rows=100 width=44) (actual time=0.046..8.859 rows=1000 loops=1)

Output: (t.v).id, (t.v).info, (t.v).crt_time

Filter: (t.* IS NOT NULL)

Rows Removed by Filter: 1

Buffers: shared hit=4007

CTE skip

->Recursive Union(cost=0.91..55.09 rows=101 width=69) (actual time=0.039..8.203 rows=1001 loops=1)

Buffers: shared hit=4007

->Limit(cost=0.91..1.67 rows=1 width=69) (actual time=0.038..0.038 rows=1 loops=1)

Output: test.*

Buffers: shared hit=8

->Nested Loop(cost=0.91..6335.47 rows=8333 width=69) (actual time=0.038..0.038 rows=1 loops=1)

Output: test.*

Buffers: shared hit=8

->HashAggregate(cost=0.48..0.49 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=1)

Output: test_1.id

Group Key: test_1.id

Buffers: shared hit=4

->Limit(cost=0.43..0.47 rows=1 width=12) (actual time=0.016..0.017 rows=1 loops=1)

Output: test_1.id, test_1.crt_time

Buffers: shared hit=4

->Index Only Scan using idx_test_1 on public.test test_1(cost=0.43..173279.55 rows=5000002 width=12) (actual time=0.015..0.015 rows=1 loops=1)

Output: test_1.id, test_1.crt_time

Index Cond: (test_1.id IS NOT NULL)

Heap Fetches: 1

Buffers: shared hit=4

->Index Scan using idx_test_1 on public.test(cost=0.43..6284.98 rows=5000 width=73) (actual time=0.015..0.015 rows=1 loops=1)

Output: test.*, test.id

Index Cond: (test.id = test_1.id)

Buffers: shared hit=4

->WorkTable Scan on skip s(cost=0.00..5.14 rows=10 width=32) (actual time=0.008..0.008 rows=1 loops=1001)

Output: (SubPlan 1)

Filter: ((s.v).id IS NOT NULL)

Rows Removed by Filter: 0

Buffers: shared hit=3999

SubPlan 1

->Limit(cost=0.43..0.49 rows=1 width=81) (actual time=0.007..0.007 rows=1 loops=1000)

Output: t_1.*, t_1.id, t_1.crt_time

Buffers: shared hit=3999

->Index Scan using idx_test_1 on public.test t_1(cost=0.43..102425.80 rows=1666667 width=81) (actual time=0.007..0.007 rows=1 loops=1000)

Output: t_1.*, t_1.id, t_1.crt_time

Index Cond: ((t_1.id > (s.v).id) AND (t_1.id IS NOT NULL))

Buffers: shared hit=3999

Planning time: 0.353 ms

Execution time: 8.980 ms

(45 rows)

2 subquery

場景一、唯一值較多(100萬唯一值)

1、subquery查詢效率

如果ID的取值範圍特別廣，SUBQUERY就很不划算。

需要維修一張唯一ID表，這裡使用generate_series來代替這張表，用於測試。

explain (analyze,verbose,timing,costs,buffers) select (select test from test where id=t.id order by crt_time desc limit 1) from generate_series(1,1000000) t(id);

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------

Function Scan on pg_catalog.generate_series t(cost=0.00..1976.65 rows=1000 width=32) (actual time=70.682..2835.109 rows=1000000 loops=1)

Output: (SubPlan 1)

Function Call: generate_series(1, 1000000)

Buffers: shared hit=3997082

SubPlan 1

->Limit(cost=0.43..1.97 rows=1 width=77) (actual time=0.002..0.002 rows=1 loops=1000000)

Output: test.*, test.crt_time

Buffers: shared hit=3997082

->Index Scan using idx_test_1 on public.test(cost=0.43..9.64 rows=6 width=77) (actual time=0.002..0.002 rows=1 loops=1000000)

Output: test.*, test.crt_time

Index Cond: (test.id = t.id)

Buffers: shared hit=3997082

Planning time: 0.119 ms

Execution time: 2892.712 ms

(14 rows)

場景二、唯一值較少(1000唯一值)

1、subquery查詢效率

查詢語句有變

explain (analyze,verbose,timing,costs,buffers) select (select test from test where id=t.id order by crt_time desc limit 1) from generate_series(1,1000) t(id);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------

Function Scan on pg_catalog.generate_series t(cost=0.00..1699.41 rows=1000 width=32) (actual time=0.107..7.041 rows=1000 loops=1)

Output: (SubPlan 1)

Function Call: generate_series(1, 1000)

Buffers: shared hit=4000

SubPlan 1

->Limit(cost=0.43..1.69 rows=1 width=77) (actual time=0.006..0.007 rows=1 loops=1000)

Output: test.*, test.crt_time

Buffers: shared hit=4000

->Index Scan using idx_test_1 on public.test(cost=0.43..6284.98 rows=5000 width=77) (actual time=0.006..0.006 rows=1 loops=1000)

Output: test.*, test.crt_time

Index Cond: (test.id = t.id)

Buffers: shared hit=4000

Planning time: 0.131 ms

Execution time: 7.126 ms

(14 rows)

3 視窗查詢

場景一、唯一值較多(100萬唯一值)

1、視窗查詢效率

explain (analyze,verbose,timing,costs,buffers) select id,info,crt_time from (select row_number() over (partition by id order by crt_time desc) as rn, * from test) t where rn=1;

postgres=# explain (analyze,verbose,timing,costs,buffers) select id,info,crt_time from (select row_number() over (partition by id order by crt_time desc) as rn, * from test) t where rn=1;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------

Subquery Scan on t(cost=0.43..310779.41 rows=25000 width=45) (actual time=0.027..6398.308 rows=993288 loops=1)

Output: t.id, t.info, t.crt_time

Filter: (t.rn = 1)

Rows Removed by Filter: 4006712

Buffers: shared hit=5018864

->WindowAgg(cost=0.43..248279.39 rows=5000002 width=53) (actual time=0.026..5973.497 rows=5000000 loops=1)

Output: row_number() OVER (?), test.id, test.info, test.crt_time

Buffers: shared hit=5018864

->Index Scan using idx_test_1 on public.test(cost=0.43..160779.35 rows=5000002 width=45) (actual time=0.019..4058.476 rows=5000000 loops=1)

Output: test.id, test.info, test.crt_time

Buffers: shared hit=5018864

Planning time: 0.121 ms

Execution time: 6446.901 ms

(13 rows)

場景二、唯一值較少(1000唯一值)

1、視窗查詢效率

查詢語句不變

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------

Subquery Scan on t(cost=0.43..310779.61 rows=25000 width=45) (actual time=0.027..6176.801 rows=1000 loops=1)

Output: t.id, t.info, t.crt_time

Filter: (t.rn = 1)

Rows Removed by Filter: 4999000

Buffers: shared hit=4744850 read=18157

->WindowAgg(cost=0.43..248279.58 rows=5000002 width=53) (actual time=0.026..5822.576 rows=5000000 loops=1)

Output: row_number() OVER (?), test.id, test.info, test.crt_time

Buffers: shared hit=4744850 read=18157

->Index Scan using idx_test_1 on public.test(cost=0.43..160779.55 rows=5000002 width=45) (actual time=0.020..4175.082 rows=5000000 loops=1)

Output: test.id, test.info, test.crt_time

Buffers: shared hit=4744850 read=18157

Planning time: 0.108 ms

Execution time: 6176.924 ms

(13 rows)

橫向效率對比圖

資料量	唯一值個數	視窗查詢(ms)	subquery(ms)	遞迴查詢(ms)
500萬	100萬	6446	2892	6706
500萬	100萬	6176	7	9

雲端產品

阿裡雲 RDSPostgreSQL

阿裡雲 HybridDB for PostgreSQL

類似案例

《PostgrSQL遞迴SQL的幾個套用 - 極客與正常人的思維》

《時序資料合併場景加速剖析和實現 - 複合索引，視窗分組查詢加速，變態遞迴加速》

《(流式、lambda、觸發程序)即時處理大比拼 - 物聯網(IoT)金融,時序處理最佳實踐》

《PostgreSQL海量時序資料(任意滑動視窗即時統計剖析) - 感應器、人群、物體等物件追蹤》

《車連網案例，曲目清洗 - 阿裡雲RDSPostgreSQL最佳實踐 - 視窗函數》

《PostgreSQL時序最佳實踐 - 證券交易系統資料庫設計 - 阿裡雲RDSPostgreSQL最佳實踐》

http://www.timescale.com/

小結

隨著物聯網的發展，越來越多的時序資料產生，求時序資料的最新值，滑動視窗內的最新值，已經成為時序商務裡面非常場景的需求。

PostgreSQL是最先進的開來源資料庫，它不像很多資料庫解決問題往往只有一種方法。PostgreSQL解決一個問題有多種方法，什麼方法最優，就看你對它的瞭解了。

1、唯一值較少時，並且唯一值範圍未知時，使用遞迴。

2、唯一值較少時，並且唯一值範圍確定時(例如範圍是100萬，但是此批資料只出現了50萬個，那麼如果你有這50萬個的ID，效能是最好的，否則需要掃100萬。例如用戶數是1億，一個區間的活躍用戶可能只有幾萬。)，使用subquery。

3、唯一值較多時，1 使用視窗查詢，比前面的方法更加適合。

4、唯一值較多時，2 使用流式計算，比方法3更加優秀。

相關產品：

1. 安全管家

2. 雲資料庫RDS

3. 資料管理

4. 雲端服務器ECS

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

時序商務（所有VALUE NEW值7秒到7毫秒的優化之旅）- 阿裡雲RDS PostgreSQL最佳實踐

幕後

時序資料取值優化

遞迴 vs subquery vs 視窗

1 遞迴

場景一、唯一值較多(100萬唯一值)

場景二、唯一值較少(1000唯一值)

2 subquery

場景一、唯一值較多(100萬唯一值)

場景二、唯一值較少(1000唯一值)

3 視窗查詢

場景一、唯一值較多(100萬唯一值)

場景二、唯一值較少(1000唯一值)

橫向效率對比圖

雲端產品

類似案例

小結

聯繫我們

熱門文章

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

時序商務（所有VALUE NEW值7秒到7毫秒的優化之旅）- 阿裡雲RDS PostgreSQL最佳實踐

幕後

時序資料取值優化

遞迴 vs subquery vs 視窗

1 遞迴

場景一、唯一值較多(100萬唯一值)

場景二、唯一值較少(1000唯一值)

2 subquery

場景一、唯一值較多(100萬唯一值)

場景二、唯一值較少(1000唯一值)

3 視窗查詢

場景一、唯一值較多(100萬唯一值)

場景二、唯一值較少(1000唯一值)

橫向效率對比圖

雲端產品

類似案例

小結

聯繫我們

熱門文章

推薦話題

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support