基於阿裡雲 RDS PostgreSQL打造即時用戶畫像推薦系統

最後更新：2018-05-24 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

幕後

用戶畫像在市場行銷的套用重建中非常常見，已經不是什麼新鮮的東西，比較流行的解決方案是給使用者加標籤，根據標籤的群組，圈出需要的使用者。

通常畫像系統會用到寬表，以及分散式的系統。

寬表的作用是隱藏標籤，例如每列代表一個標籤。

但實際上這種設計不一定是最優或唯一的設計，本文將以PostgreSQL資料庫為基礎，給大家講解一下更加另類的設計思路，並且看看效率如何。

商務場景模擬

假設有一個2B的即時用戶推薦系統，每個APPID代表一個B。

商務資料包括APPID，USERIDs，TAGs。（2B的用戶ID，終端使用者ID，標籤）

商務沒有跨APPID的資料交換動作需求，也就是說僅限於APPID內的用戶推薦。

查詢局限在某個選定的APPID，以及TAG群組，搜尋合格USERID，並將USERID發送給用戶。

資料總量約10億，單個APPID的用戶數最大約1億。

TAG總數設計容量為1萬個。

查詢需求: 包含，不包含，或，與。

並行幾百，RT 毫秒級。

接下來我會列舉4個方案，並剖析每種方案的優缺點。

一、寬表方案剖析

通常表的寬度是有節流的，以PostgreSQL為例，一條記錄是無法跨PAGE的（變長欄位隱藏到TOAST隱藏，以隱藏超過1頁大小的列，頁內只隱藏指標），這就使得表的寬度受到了節流。

例如8KB的資料區塊，可能能存下接近2000個列。

如果要為每個TAG設計一個列，則需要1萬個列的寬表。

相信其它資料庫也有類似的節流，1萬個列的寬表，除非改造資料庫核心，否則無法滿足需求。

那麼可以使用APPID+USERID作為PK，隱藏為多個表來實現無限個TAG的需求。以單表1000個列為例，10個表就能滿足1萬個TAG的需求。

create table t_tags_1(appid int, userid int8, tag1 boolean, tag2 boolean, ...... tag1000 boolean);

.....

create table t_tags_10(appid int, userid int8, tag9001 boolean, tag9002 boolean, ...... tag10000 boolean);

為了升階效率，要為每個tag欄位建立索引，也就是說需要1萬個索引。

如果TAG的群組跨表了，還有JOIN動作。

1. 優點

沒有用什麼特殊的優化，幾乎所有的資料庫都支援。

2. 缺點

效能不一定好，特別是查詢群組條件多的話，效能會下降比較明顯，例如(tag1 and tag2 and (tag4 ortag5) or not tag6) 。

二、陣列方案剖析

使用陣列代替TAG列，要求資料庫有陣列類型，同時有陣列的高效取出能力，這一點PostgreSQL可以很好的滿足需求。

1. 資料結構

APPID， USERID， TAG[]陣列

單個陣列最大長度1GB(約支援2.6億個TAG)

2. 按APPID分區，推測分區

3. query文法

3.1 包含array2指定的所有TAG

陣列1包含陣列2的所有元素

array1 @> array2

支援索引取出

3.2 包含array2指定的TAG之一

陣列1與陣列2有重疊元素

array1 &&array2

支援索引取出

3.3 不包含array2指定的所有tag

陣列1與陣列2沒有重疊元素

not array1 && array2

不支援索引取出

4. 例子

create table t_arr(appid int, userid int8, tags int2[]) with(parallel_workers=128);

create index idx_t_array_tags on t_arr using gin (tags) with (fastupdate=on, gin_pending_list_limit= 1024000000);

create index idx_t_arr_uid on t_arr(userid);

819200KB約緩衝10000條80K的陣列記錄，可以自行調整.

1.每個USERID包含10000個TAG(極限)。

insert into t_arr select 1, 2000000000*random(),(select array_agg(10000*random()) from generate_series(1,10000));

nohup pgbench -M prepared -n -r -f ./test.sql -P 1 -c 50 -j 50 -t 2000000 > ./arr.log 2>&1 &

5. 優點

可以隱藏很多TAG，幾億個足夠用啦(產業內有1萬個TAG的已經是非常多的啦)。

支援陣列的索引查詢，但是not不支援索引。

6. 缺點

資料量還是有點大，一條記錄1萬個TAG，約80KB。

1億記錄約8TB，索引還需要約8TB。

不是所有的資料庫都支援數群組類型。

三、位元位方案1剖析

使用BIT隱藏TAG，0和1表示有或者沒有這個TAG。

1. 資料結構

APPID， USERID， TAG位元流

單個BIT欄位最大支援1GB長度BIT流（支援85億個TAG）

每個BIT代表一個TAG

2. 按APPID分區，推測分區

3. query文法

3.1 包含bit2指定的所有TAG(需要包含的TAG對應的BIT設定為1，其他為0)

bitand(bit1,bit2) = bit2

3.2 包含bit2指定的TAG之一(需要包含的TAG對應的BIT設定為1，其他為0)

bitand(bit1,bit2) > 0

3.3 不包含bit2指定的所有tag (需要包含的TAG對應的BIT設定為1，其他為0)

bitand(bit1,bit2) = zerobit(10000)

4. 例子

create table t_bit(appid int, userid int8, tags varbit) ;

create index idx_t_bit_uid on t_bit(userid);

每個USERID對應10000位推測的位元值

date;for ((i=1;i<=50;i++)); do psql -c "insert into t_bit select 1, 2000000000*random(),

(select (string_agg(mod((2*random())::int,2)::text,''))::varbit from generate_series(1,10000)) tags

from generate_series(1,2000000)" ; done; date

127GB,每秒插入24.5萬, 326MB/s

插入速度

(批量)每秒插入24.5萬, 326MB/s

更新、移除 tag速度

create or replace function randbit(int) returns varbit as $$

select (string_agg(mod((2*random())::int,2)::text,''))::varbit from generate_series(1,$1);

$$ language sql strict volatile;

create or replace function zerobit(int) returns varbit as $$

select (string_agg('0',''))::varbit from generate_series(1,$1);

$$ language sql strict immutable;

update t_bit set tags=randbit(10000) where userid=:id;

每秒更新、移除1萬記錄，回應時間約4毫秒

查詢速度

do language plpgsql $$

declare

sql text;

bit1 varbit := randbit(10000);

bit2 varbit := randbit(10000);

bit3 varbit := randbit(10000);

zbit varbit := zerobit(10000);

begin

set max_parallel_workers_per_gather =27;

sql := 'select * from t_bit where bitand(tags,'''||bit1::text||''')='''||bit1::text||''' and bitand(tags,'''||bit2::text||''')>bit''0'' and bitand(tags,'''||bit3::text||''')='''||zbit::text||'''';

raise notice '%', sql;

-- execute sql;

end;

$$;

開27個並行, 17秒。

5. 優點

可以隱藏很多TAG，85億個TAG足夠用啦吧(產業內有1萬個TAG的已經是非常多的啦)。

1萬個TAG，佔用1萬個BIT，約1.25KB。

1億記錄約120GB，無索引。

6. 缺點

沒有索引方法，查詢是只能通過平行計算升階效能。

PostgreSQL 9.6 支援CPU平行計算，1億用戶時，可以滿足20秒內返回，但是會消耗很多的CPU資源，因此查詢的並行度不能做到很高。

四、位元位方案2剖析

有沒有又高效，又節省資源的方法呢？

答案是有的。

因為查詢通常是以TAG為群組條件，取出複合有條件的USERID的查詢。

所以反過來設計，查詢效果就會很好，以TAG為維度，USERID為位元位的設計。

我們需要維修的是每個tag下有哪些用戶，所以這塊的資料更新量會很大，需要考慮增量合併與讀時合併的設計。

資料流如下，資料可以快速的寫入

data ->明細表 ->增量彙總 -> appid, tagid, userid_bits

讀取時，使用兩部份資料進行合併，一部分是tag的計算結果，另一部分是未合併的明細表的結果，兩者MERGE。

當然，如果可以做到分鐘內的合併延遲，商務也能夠忍受分鐘的延遲的話，那麼查詢是就沒有MERGE的必要了，直接查結果，那會非常非常快。

1. query

1.1 包含這些tags的用戶

userids (bitand) userids

結果為bit位為1的用戶

1.2 不包含這些tags的用戶

userids (bitor) userids

結果為bit位為0的用戶

1.3 包含這些tags之一的用戶

userids (bitor) userids

結果為bit位為1的用戶

2. 優點

因為資料存放區的維度發生了變化，採用以查詢為目標的設計，資料的查詢效率非常高。

3. 缺點

由於使用了位元位表示USERID，所以必須有置放與USERID的對應關聯。

需要維修使用者ID字典表，需要使用增量合併的手段減少資料的更新頻率。

會有一定的延遲，通常可以控制在分鐘內，如果商務允許這樣的延遲，則非常棒。

通常商務的USERID會週期性的失效（例如僵屍USERID，隨著時間可以逐漸失效），那麼需要週期性的維修使用者ID字典，同時也要更新USERID位元資訊。

架構如圖

五、位元位方案2實施細節

阿裡雲RDSPostgreSQL 新增的bit動作函數

本文會用到幾個新增的FUNCTION，這幾個function很有用，同時會上線阿裡雲的RDS PostgreSQL中。

get_bit (varbit, int, int) returns varbit

從指定置放開始追蹤N個BIT位，返回varbit

例如 get_bit('111110000011', 3, 5)返回11000

set_bit_array (varbit, int, int, int[]) returns varbit

將指定置放的BIT設定為0|1，超出原始長度的部份填滿0|1

例如 set_bit_array('111100001111', 0, 1, array[1,15])返回 1011000011111110

bit_count (varbit, int, int, int) returns int

從第n位開始，統計N個BIT位中有多少個0|1，如果N超出長度，則只計算已經存在的。

例如 bit_count('1111000011110000', 1, 5, 4)返回 1(0001)

bit_count (varbit, int) returns int

統計整個bit string中1|0的個數。

例如 bit_count('1111000011110000', 1)返回 8

bit_fill (int, int) returns varbit

填滿指定長度的0或 1

例如 bit_fill(0,10)返回 '0000000000'

bit_rand (int, int, float) returns varbit

填滿指定長度的推測BIT，並指定1或0的推測比例

例如 bit_rand(10, 1, 0.3)可能返回 '0101000001'

bit_posite (varbit, int, boolean) returns int[]

返回 1|0的置放資訊，下標從0開始計數, true時正向返回，false時反向返回

例如 bit_posite ('11110010011', 1, true)返回 [0,1,2,3,6,9,10]

bit_posite ('11110010011', 1, false)返回 [10,9,6,3,2,1,0]

bit_posite (varbit, int, int, boolean) returns int[]

返回 1|0的置放資訊，下標從0開始計數，true時正向返回，false時反向返回，返回N個為止

例如 bit_posite ('11110010011', 1, 3, true)返回 [0,1,2]

bit_posite ('11110010011', 1, 3, false)返回 [10,9,6]

get_bit_2 (varbit, int, int) returns int

返回指定置放的bit,下標從0開始，如果超出BIT置放，返回指定的0或1.

例如 get_bit_2('111110000011', 100, 0)返回 0(100已經超出長度，返回用戶指定的0)

資料庫內建的BIT動作函數請參考源碼

src/backend/utils/adt/varbit.c

表結構設計

使用bit隱藏用戶

userid int8表示，可以超過40億。

rowid int表示，也就是說單個APPID不能允許超過20億的用戶，從0開始自增，配合BIT下標的表示。

appid int表示，不會超過40億個。

1. 字典表,rowid決定MAP順序，使用視窗查詢返回順序。

drop table IF EXISTS t_userid_dic;

create table IF NOT EXISTS t_userid_dic(appid int not null, rowid int not null, userid int8 not null, unique (appid,userid), unique (appid,rowid));

插入使用者字典表的函數，可以產生無縫的連續ROWID。

create or replace function f_uniq(i_appid int, i_userid int8) returns int as $$

declare

newid int;

i int := 0;

res int;

stack1 text;

stack2 text;

stack3 text;

stack4 text;

stack5 text;

stack6 text;

stack7 text;

stack8 text;

stack9 text;

stack10 text;

begin

loop

if i>0 then

perform pg_sleep(random());

else

i := i+1;

end if;

--追蹤已有的最大ID+1 (即將插入的ID)

select max(rowid)+1 into newid from t_userid_dic where appid=i_appid;

if newid is not null then

--追蹤AD LOCK (乘以appid,為了避免不同ID的衝突,演算法可以再改進)

if pg_try_advisory_xact_lock(i_appid::int8 * newid) then

--插入

insert into t_userid_dic (appid, rowid, userid) values (i_appid, newid, i_userid);

--返回此次追蹤到的UID

return newid;

else

--沒有追蹤到AD LOCK則繼續循環

continue;

end if;

else

--表示這是第一條記錄，追蹤AD=0的LOCK

if pg_try_advisory_xact_lock(0 * i_appid::int8) then

insert into t_userid_dic (appid, rowid, userid) values (i_appid, 0, i_userid);

return 0;

else

continue;

end if;

end if;

end loop;

exception

--只忽略缺口唯一約束的錯誤，其他錯誤繼續報

when SQLSTATE '23505' then

return null;

end;

$$ language plpgsql strict;

如果以上叫用返回NULL，說明插入失敗，可能缺口了唯一約束，套用端重試即可。

壓測以上函數是否能無縫插入，壓測時raise notice可以去掉。

$ vi test.sql

set appid random(1,1000)

set userid random(1,2000000000)

select f_uniq(:appid, :userid);

164個並行

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 164 -j 164 -T 10

progress: 10.0 s, 85720.5 tps, lat 1.979 ms stddev 34.808

transaction type: ./test.sql

scaling factor: 1

query mode: prepared

number of clients: 164

number of threads: 164

duration: 10 s

number of transactions actually processed: 827654

latency average = 2.039 ms

latency stddev = 35.674 ms

tps = 75435.422933 (including connections establishing)

tps = 75483.813182 (excluding connections establishing)

script statistics:

 - statement latencies in milliseconds:

0.001set appid random(1,1000)

0.000set userid random(1,2000000000)

2.034select f_uniq(:appid, :userid);

驗證

postgres=# select count(*) from t_userid_dic ;

 count

--------

(1 row)

postgres=# select appid,count(*),max(rowid) from t_userid_dic group by 1;

 appid | count | max

-------+-------+-----

251 |857 | 857

106 |854 | 854

681 |816 | 816

120 |826 | 826

285 |815 | 815

866 |778 | 778

264 |873 | 873

......

postgres=# select * from (select appid,count(*),max(rowid) from t_userid_dic group by 1) t where max<>count;

 appid | count | max

-------+-------+-----

(0 rows)

插入速度，無縫需求，完全符合要求。

生成1億測試用戶,APPID=1, 用於後面的測試

truncate t_userid_dic;

insert into t_userid_dic select 1, generate_series(1,100000000), generate_series(1,100000000);

2. 即時變更表

為了提高寫入效能，資料將即時的寫入這張表，後臺增量的將這個表的資料合併到TAG表。

drop table IF EXISTS t_user_tags;

create table IF NOT EXISTS t_user_tags(id serial8 primary key, appid int, userid int8, tag int, ins boolean, dic boolean default false);

create index idx_t_user_tags_id on t_user_tags(id) where dic is false;

-- ins = true表示新增tag, =false表示移除tag。

-- dic = true表示該記錄設計的USERID已經合併到使用者字典表

--這張表的設計有優化餘地，例如最好切換使用，以清理資料，比如每天一張，儲存31天。

生成1.5千萬測試資料(APPID=1 , USERID 總量20億，推測產生, 新增tagid 範圍1-10000,移除tagid 範圍1-1000)

insert into t_user_tags (appid,userid,tag,ins) select 1, 2000000000*random(),10000*random(),true from generate_series(1,10000000);

insert into t_user_tags (appid,userid,tag,ins) select 1, 2000000000*random(),5000*random(),false from generate_series(1,5000000);

3. tag + userids bitmap 表，這個是最關鍵的表，查詢量很大，從t_user_tags增量合併進這個表。

drop table IF EXISTS t_tags;

create table IF NOT EXISTS t_tags(id serial primary key, appid int, tag int, userids varbit);

-- id可以用於advisory lock,並行更新

create unique index idx_t_tags_uk on t_tags(tag,appid);

生成1萬個TAG的測試資料，每個TAG包含1億個用戶的BIT。方便下麵的測試

--生成推測bit

CREATE OR REPLACE FUNCTION public.randbit(integer)

 RETURNS bit varying

 LANGUAGE plpgsql

 STRICT

AS $function$

declare

res varbit;

begin

select (string_agg(mod((2*random())::int,2)::text,''))::varbit into res from generate_series(1,$1);

if res is not null then

return res;

else

return ''::varbit;

end if;

end;

$function$

create sequence seq;

--並行插入10000條記錄

$ vi test.sql

insert into t_tags(appid,tag,userids) select 1,nextval('seq'::regclass),randbit(100000000);

$ pgbench -M simple -n -r -f ./test.sql -c 50 -j 50 -t 200

測試TAG群組查詢效能

這個指標顯示了用戶勾選一些TAG群組後，圈定並返回用戶群體的效能。

測試方法很簡單：包含所有，不包含，包含任意。

1. 包含以下TAG的用戶ID

userids (bitand) userids

結果為bit位為1的用戶

測試SQL如下

--追蹤最大BIT的長度

with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (?,?,...) )

select appid,userid from t_userid_dic, tmp

where appid = ?

and rowid = any

--正向取出1萬個bit=1的用戶,如果要反向取，使用false(表示取最近的用戶資料)

-- rowid從0開始計數，剛好與bit下標起始置放配對

select bit_posite(res, 1, 10000, true) from

select t1.userids & t2.userids & t3.userids & t4.userids AS res -- & ......

from

--根據最大長度, BIT補齊

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t1 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t2 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t3 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t4

-- ......

) t

)::int[]

效能資料

with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (226833, 226830, 226836, 226834) )

select appid,userid from t_userid_dic, tmp

where appid = 1

and rowid = any

select bit_posite(res, 1, 10000, true) from

select t1.userids & t2.userids & t3.userids & t4.userids as res

from

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226833 ) t1 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226830 ) t2 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226836 ) t3 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226834 ) t4

) t

)::int[]

2. 不包含以下TAG的用戶

userids (bitor) userids

結果為bit位為0的用戶

測試SQL如下

--追蹤最大BIT的長度

with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (?,?,...) )

select appid,userid from t_userid_dic, tmp

where appid = ?

and rowid = any

--正向取出1萬個bit=0的用戶,如果要反向取，使用false(表示取最近的用戶資料)

select bit_posite(res, 0, 10000, true) from

select t1.userids | t2.userids | t3.userids | t4.userids AS res -- | ......

from

--根據最大長度, BIT補齊

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t1 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t2 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t3 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t4

-- ......

) t

)::int[]

效能資料

with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (226833, 226830, 226836, 226834) )

select appid,userid from t_userid_dic, tmp

where appid = 1

and rowid = any

select bit_posite(res, 0, 10000, true) from

select t1.userids | t2.userids | t3.userids | t4.userids as res

from

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226833 ) t1 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226830 ) t2 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226836 ) t3 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226834 ) t4

) t

)::int[]

3. 包含以下任意TAG

userids (bitor) userids

結果為bit位為1的用戶

測試SQL如下

--追蹤最大BIT的長度

with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (?,?,...) )

select appid,userid from t_userid_dic, tmp

where appid = ?

and rowid = any

--正向取出1萬個bit=1的用戶,如果要反向取，使用false(表示取最近的用戶資料)

select bit_posite(res, 1, 10000, true) from

select t1.userids | t2.userids | t3.userids | t4.userids AS res -- | ......

from

--根據最大長度, BIT補齊

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t1 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t2 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t3 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t4

-- ......

) t

)::int[]

效能資料

with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (226833, 226830, 226836, 226834) )

select appid,userid from t_userid_dic, tmp

where appid = 1

and rowid = any

select bit_posite(res, 1, 10000, true) from

select t1.userids | t2.userids | t3.userids | t4.userids as res

from

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226833 ) t1 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226830 ) t2 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226836 ) t3 ,

( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226834 ) t4

) t

)::int[]

進階功能

1. 結合bit_posite，可以實現正向取若干用戶，反向取若干用戶(例如有100萬個結果，本次推廣只要1萬個用戶，而且要最近新增的1萬個用戶，則反向取1萬個用戶即可)。

2. 結合get_bit則可以實現截取某一段BIT，再取得結果，很好用哦。

測試新增資料的效能

新增資料即往t_user_tags表插入資料的效能。

postgres=# d+ t_user_tags

Table "public.t_user_tags"

 Column |Type|Modifiers| Storage | Stats target | Description

--------+---------+----------------------------------------------------------+---------+--------------+-------------

 id| bigint| not null default nextval('t_user_tags_id_seq'::regclass) | plain||

appid| integer || plain||

userid | bigint|| plain||

tag| integer || plain||

ins| boolean || plain||

dic| boolean | default false| plain||

Indexes:

"t_user_tags_pkey" PRIMARY KEY, btree (id)

"idx_t_user_tags_id" btree (id) WHERE dic IS FALSE

測試如下

$ vi test.sql

set appid random(1,1000)

set userid random(1,2000000000)

set new_tag random(1,10000)

set old_tag random(8001,10000)

insert into t_user_tags (appid,userid,tag,ins) values (:appid, :userid, :new_tag, true);

insert into t_user_tags (appid,userid,tag,ins) values (:appid, :userid, :old_tag, false);

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 128 -j 128 -T 120

postgres=# select count(*) from t_user_tags;

count

----------

14721724

(1 row)

效能資料(單步作業的QPS約12.2萬，包括新增，移除TAG)

更新的動作需要拆成兩個部份，新增和移除，不要合併到一條記錄中。

transaction type: ./test.sql

scaling factor: 1

query mode: prepared

number of clients: 128

number of threads: 128

duration: 120 s

number of transactions actually processed: 7360862

latency average = 2.085 ms

latency stddev = 1.678 ms

tps = 61326.338528 (including connections establishing)

tps = 61329.196790 (excluding connections establishing)

script statistics:

 - statement latencies in milliseconds:

0.001set appid random(1,1000)

0.000set userid random(1,2000000000)

0.000set new_tag random(1,10000)

0.000set old_tag random(8001,10000)

1.042insert into t_user_tags (appid,userid,tag,ins) values (:appid, :userid, :new_tag, true);

1.037insert into t_user_tags (appid,userid,tag,ins) values (:appid, :userid, :old_tag, false);

測試資料的合併效能

資料的合併包括3個部份，

1. 更新使用者字典表t_userid_dic，

2. 批量追蹤並移除t_user_tags的記錄，

3. 合併標籤資料到t_tags。

以上三個動作應該在一個交易中完成。

考慮到t_tags表userids欄位1億BIT約12.5MB，更新單條記錄耗時會比較長，所以建議採用並行的圖樣，每個TAG都可以並行。

更新字典封裝為函數如下

從t_user_tags表取出資料並更新資料字典，被取出的資料標記為允許合併。

此動作沒有必要並行，串列即可，搞個後臺流程無限迴圈。

create or replace function update_dict(v_rows int) returns void as $$

declare

min_id int;--邊界id

f_uniq_res int8;

begin

if v_rows<1 then

raise notice 'v_rows must >=1';

return;

end if;

--尋找APPID, tag對應最小邊界,沒有記錄直接返回

select min(id) into min_id from t_user_tags where dic=false;

if not found then

raise notice 'no data';

return;

end if;

--插入本次需要處理的臨時資料到陣列

-- f_uniq失敗也不會報錯，這裡需要修改一下f_uniq僅對UK衝突不處理，其他錯誤還是需要處理的，否則t_user_tags改了，但是USER可能沒有進入字典。

with tmp as (update t_user_tags t set dic=true where id>=min_id and id<=min_id+v_rows returning *)

select count(*) into f_uniq_res from (select f_uniq(appid,userid) from (select appid,userid from tmp group by 1,2) t) t;

end;

$$ language plpgsql;

由於批量動作，可能申請大量的ad lock, 所以需要增加max_locks_per_transaction, 資料庫參數調整

max_locks_per_transaction=40960

驗證

postgres=# select update_dict(200000);

 update_dict

-------------

(1 row)

Time: 8986.175 ms

多執行幾次

Time: 9395.991 ms

Time: 10798.631 ms

Time: 10726.547 ms

Time: 10620.055 ms

每秒約處理2萬

驗證字典更新是否準確

postgres=# select count(*) from t_userid_dic ;

count

---------

(1 row)

Time: 110.656 ms

postgres=# select count(*) from (select appid,userid from t_user_tags where dic=true group by 1,2) t;

count

---------

(1 row)

Time: 2721.264 ms

postgres=# select * from t_userid_dic order by appid,rowid limit 10;

 appid | rowid |userid

-------+-------+------------

1 |0 | 1802787010

1 |1 | 1342147584

1 |2 | 1560458710

1 |3 | 1478701081

1 |4 | 1826138023

1 |5 |182295180

1 |6 | 1736227913

1 |7 |512247294

1 |8 |686842950

1 |9 | 1940486738

(10 rows)

postgres=# select min(rowid),max(rowid),count(*),appid from t_userid_dic group by appid;

 min |max|count| appid

-----+---------+---------+-------

1 | 1399501 | 1399501 |1

(1 row)

Time: 369.562 ms

雖然沒有必要並行，但是這個函數需要保護其並行的安全性，所以接下來驗證並行安全性

$ vi test.sql

select update_dict(1000);

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100

驗證並行結果的安全性，結果可靠

postgres=# select count(*) from t_userid_dic ;

count

---------

(1 row)

postgres=# select count(*) from (select appid,userid from t_user_tags where dic=true group by 1,2) t;

count

---------

(1 row)

postgres=# select * from t_userid_dic order by appid,rowid limit 10;

 appid | rowid |userid

-------+-------+------------

1 |0 | 1802787010

1 |1 | 1342147584

1 |2 | 1560458710

1 |3 | 1478701081

1 |4 | 1826138023

1 |5 |182295180

1 |6 | 1736227913

1 |7 |512247294

1 |8 |686842950

1 |9 | 1940486738

(10 rows)

postgres=# select min(rowid),max(rowid),count(*),appid from t_userid_dic group by appid;

 min |max|count| appid

-----+---------+---------+-------

1 | 1533534 | 1533534 |1

(1 row)

資料合併封裝為函數如下

寫在前面: 如果你要在一個函數中處理使用者字典的更新和合併的動作，務必使用repeatable read隔離等級，許諾處理字典的資料與合併的資料一致。

前面已經處理了字典的更新，接下來就可以將t_user_tags.dic=true的資料，合併到t_tags中。

考慮更新T_TAGS可能較慢，儘量提高並行度，不同TAG並行。

-- 不要對同一個APPID並行使用APPID與APPID+tag的圖樣.

create or replace function merge_tags(

v_appid int,--輸入本次要處理的appid

v_tag int,--輸入本次要處理的tags ,輸入null則處理該APPID的所有tag,注意不要使用strict function,否則遇到NULL參數會直接返回NULL

v_rows int--輸入要本次處理多少條記錄，即合併量

) returns void as $$

declare

min_id int;--邊界id

ad_lockid int8;--可以有效避免不同的APPID在平行作業時衝突

--循環

i_tag int;

i_userid_del int8[];

i_userid_add int8[];

--將userid[]轉換為ROWID[]

i_rowid_del int[];

i_rowid_add int[];

--臨時隱藏,用於debug

i_userids_del varbit := '0'::varbit;

i_userids_add varbit := '0'::varbit;

begin

--計算ad_lockid

if v_tag is not null then

ad_lockid := (v_appid+1)^2 + (v_tag+1)^2;

else

ad_lockid := (v_appid+1)^2;

end if;

--追蹤並行保護鎖

if not pg_try_advisory_xact_lock(ad_lockid) then

raise notice 'cann''t parallel merge same tag with ad_lockid: %', ad_lockid;

return;

end if;

--處理行數

if v_rows<1 then

raise notice 'v_rows must >=1';

return;

end if;

--判斷是否處理單個tag

if v_tag is not null then

-- raise notice 'v_tag: %', v_tag;

--處理單個tag

--尋找APPID, tag對應最小邊界,沒有記錄直接返回

select min(id) into min_id from t_user_tags where dic=true and appid=v_appid and tag=v_tag;

if not found then

raise notice 'no data for appid:% , tag:% ', v_appid, v_tag;

return;

end if;

--追蹤即時資料，移除即時資料，彙總即時資料，合併到TAGS

--建議RDS PostgreSQL新增delete|update limit文法，拋棄with id>=min_id for update

--這裡有個BUG,合並沒有考慮到TAG的新增或移除的先後順序，例如某APPID的某用戶新增了一個TAG，然後又把這個TAG移除了。最後合併的結果會變成有這個TAG。

--使用window函數可以解決這個問題,將array_agg這個QUERY改掉,在彙總前，先合併資料，使用ins, row_number() over (partition by appid,userid,tag order by id desc) rn .... where rn=1,每一個appid,userid,tag取最後一個ins的值。

for i_tag, i_rowid_del, i_rowid_add in

with tmp as (select * from t_user_tags t where dic=true and appid=v_appid and tag=v_tag and id>=min_id order by id limit v_rows for update),

tmp0 as (select * from ( select *, row_number() over (partition by appid,userid,tag order by id desc) as rn from tmp ) as ss where ss.rn=1),--同一個appid,userid用視窗取最後一條狀態

tmp1 as (delete from t_user_tags t where exists (select 1 from tmp where tmp.id=t.id))

select t1.tag, array_remove(array_agg(case when not t1.ins then t2.rowid else null end), null) code_del, array_remove(array_agg(case when t1.ins then t2.rowid else null end), null) code_add

from tmp3 t1 join t_userid_dic t2 on (t1.type=t2.type and t1.code=t2.code and t2.appid=v_appid) group by t1.tag_name

-- select tag, array_agg(case when not ins then userid else null end) userid_del, array_agg(case when ins then userid else null end) userid_add from tmp group by tag

loop

--判斷TAG是否存在,存在則更新，否則插入

perform 1 from t_tags where appid=v_appid and tag=i_tag;

if found then

update t_tags set userids = set_bit_array( set_bit_array(userids, 0, 0, i_rowid_del), 1, 0, i_rowid_add )::text::varbit where appid=v_appid and tag=i_tag;

else

insert into t_tags(appid, tag, userids) values (v_appid, i_tag, set_bit_array( set_bit_array('0'::varbit, 0, 0, i_rowid_del), 1, 0, i_rowid_add )::text::varbit);

end if;

end loop;

else

--處理所有tag

--尋找APPID最小邊界,沒有記錄直接返回

select min(id) into min_id from t_user_tags where dic=true and appid=v_appid;

if not found then

raise notice 'no data for appid:%', v_appid;

return;

end if;

--追蹤即時資料，移除即時資料，彙總即時資料，合併到TAGS

--這裡有個BUG,合並沒有考慮到TAG的新增或移除的先後順序，例如某APPID的某用戶新增了一個TAG，然後又把這個TAG移除了。最後合併的結果會變成有這個TAG。

--使用window函數可以解決這個問題,將array_agg這個QUERY改掉,在彙總前,先合併資料,使用ins, row_number() over (partition by appid,userid,tag order by id desc) rn .... where rn=1,每一個appid,userid,tag取最後一個ins的值。

for i_tag, i_rowid_del, i_rowid_add in

with tmp as (select * from t_user_tags t where dic=true and appid=v_appid and id>=min_id order by id limit v_rows for update),

tmp0 as (select * from ( select *, row_number() over (partition by appid,userid,tag order by id desc) as rn from tmp ) as ss where ss.rn=1),--同一個appid,userid用視窗取最後一條狀態

tmp1 as (delete from t_user_tags t where exists (select 1 from tmp where tmp.id=t.id))

select t1.tag, array_remove(array_agg(case when not t1.ins then t2.rowid else null end), null) code_del, array_remove(array_agg(case when t1.ins then t2.rowid else null end), null) code_add

from tmp3 t1 join t_userid_dic t2 on (t1.type=t2.type and t1.code=t2.code and t2.appid=v_appid) group by t1.tag_name

-- select tag, array_agg(case when not ins then userid else null end) userid_del, array_agg(case when ins then userid else null end) userid_add from tmp group by tag

loop

-- execute format('select coalesce(array_agg(rowid), array[]::int[]) from t_userid_dic where appid=%L and userid = any (%L)', v_appid, array_remove(i_userid_del, null) ) into i_rowid_del;

-- execute format('select coalesce(array_agg(rowid), array[]::int[]) from t_userid_dic where appid=%L and userid = any (%L)', v_appid, array_remove(i_userid_add, null) ) into i_rowid_add;

--判斷TAG是否存在,存在則更新，否則插入

perform 1 from t_tags where appid=v_appid and tag=i_tag;

if found then

update t_tags set userids = set_bit_array( set_bit_array(userids, 0, 0, i_rowid_del), 1, 0, i_rowid_add )::text::varbit where appid=v_appid and tag=i_tag;

else

insert into t_tags(appid, tag, userids) values (v_appid, i_tag, set_bit_array( set_bit_array('0'::varbit, 0, 0, i_rowid_del), 1, 0, i_rowid_add )::text::varbit);

end if;

end loop;

end if;

end;

$$ language plpgsql;

--不要使用strict

速度測試

$ vi test.sql

set tag random(1,10000)

select merge_tags(1,:tag,10000);

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 128 -j 128 -T 100

每秒約合併15萬記錄。

驗證方法，merge的結果資料與被merge的資料一致即可。

符合要求

truncate t_tags;

select update_dict(100000);

create table check_merge1 as select tag,count(*) cnt from (select tag,userid from t_user_tags where dic=true order by id limit 10000) t group by tag;

select merge_tags(1,null,10000);

。。。。。。

select merge_tags(1,null,10000);

create table check_merge2 as select tag, count_bit(userids,1) cnt from t_tags;

postgres=# select t1.*,t2.* from check_merge1 t1 full outer join check_merge2 t2 on (t1.tag=t2.tag and t1.cnt=t2.cnt) where t1.* is null or t2.* is null;

 tag | cnt | tag | cnt

-----+-----+-----+-----

(0 rows)

Time: 5.133 ms

如果有結果說明合併有問題。

查詢單個用戶有哪些TAG

找到userid對應的rowid,根據userids rowid置放的bit，判斷是否有該tag.

select tag from (select tag, get_bit(t1.userids, t2.rowid-1) bt from t_tags t1, (select rowid from t_userid_dic where userid=?) t2) t where bt=1;

例子，注意對齊（或者改造get_bit函數，支援沒有BIT位的動作）

postgres=# set FETCH_COUNT 1

postgres=# select tag from (select tag, get_bit(t1.userids, t2.rowid-1) bt from t_tags t1, (select rowid from t_userid_dic where userid=100000) t2) t where bt=1;

tag

--------

(10 rows)

Time: 152.636 ms

查詢單個用戶有哪些TAG是一個比較重的動作，如果碰到有很多TAG並且用戶數非常龐大時，建議使用並行。

並行相關參數設定

postgres=# show parallel_tuple_cost;

 parallel_tuple_cost

---------------------

postgres=# show parallel_setup_cost ;

 parallel_setup_cost

---------------------

postgres=# show max_parallel_workers_per_gather ;

 max_parallel_workers_per_gather

---------------------------------

postgres=# show max_worker_processes ;

 max_worker_processes

----------------------

postgres=# show force_parallel_mode ;

 force_parallel_mode

---------------------

on

postgres=# alter table t_tags set (parallel_workers=27);

ALTER TABLE

返回用戶陣列，並行化之後，每個tag處理約耗時0.76毫秒。

如果用游標返回，可以快速得到第一個用戶。

postgres=# explain (analyze,verbose,costs,buffers,timing) select array_agg(tag) from t_tags where get_bit(userids,10000)=1;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------

 Aggregate(cost=1047.68..1047.69 rows=1 width=32) (actual time=7176.745..7176.745 rows=1 loops=1)

Output: array_agg(tag)

Buffers: shared hit=15912565

->Gather(cost=0.00..1047.55 rows=50 width=4) (actual time=8.940..7175.486 rows=4957 loops=1)

Output: tag

Workers Planned: 27

Workers Launched: 27

Buffers: shared hit=15912565

->Parallel Seq Scan on public.t_tags(cost=0.00..1047.55 rows=2 width=4) (actual time=51.974..6023.333 rows=177 loops=28)

Output: tag

Filter: (get_bit((t_tags.userids)::"bit", 10000) = 1)

Rows Removed by Filter: 180

Buffers: shared hit=15909973

Worker 0: actual time=24.376..5791.799 rows=158 loops=1

Buffers: shared hit=528366

Worker 1: actual time=23.143..6749.264 rows=198 loops=1

Buffers: shared hit=632954

Worker 2: actual time=55.201..6400.872 rows=186 loops=1

Buffers: shared hit=604388

Worker 3: actual time=23.456..5351.070 rows=152 loops=1

Buffers: shared hit=482151

Worker 4: actual time=71.248..6179.161 rows=181 loops=1

Buffers: shared hit=580237

Worker 5: actual time=124.402..5395.424 rows=140 loops=1

Buffers: shared hit=493010

Worker 6: actual time=111.926..6013.077 rows=168 loops=1

Buffers: shared hit=553851

Worker 7: actual time=24.483..7170.148 rows=202 loops=1

Buffers: shared hit=677578

Worker 8: actual time=23.284..5448.081 rows=139 loops=1

Buffers: shared hit=487985

Worker 9: actual time=54.677..7057.927 rows=233 loops=1

Buffers: shared hit=666715

Worker 10: actual time=73.070..6615.151 rows=177 loops=1

Buffers: shared hit=622393

Worker 11: actual time=25.978..5977.110 rows=182 loops=1

Buffers: shared hit=552329

Worker 12: actual time=22.975..5366.569 rows=150 loops=1

Buffers: shared hit=480447

Worker 13: actual time=76.756..6940.743 rows=201 loops=1

Buffers: shared hit=655799

Worker 14: actual time=54.590..5362.862 rows=161 loops=1

Buffers: shared hit=482488

Worker 15: actual time=106.099..5454.446 rows=153 loops=1

Buffers: shared hit=494638

Worker 16: actual time=53.649..6048.233 rows=165 loops=1

Buffers: shared hit=553771

Worker 17: actual time=23.089..5810.984 rows=160 loops=1

Buffers: shared hit=532711

Worker 18: actual time=55.039..5981.338 rows=165 loops=1

Buffers: shared hit=542380

Worker 19: actual time=24.163..6187.498 rows=182 loops=1

Buffers: shared hit=571046

Worker 20: actual time=23.965..6119.395 rows=194 loops=1

Buffers: shared hit=566214

Worker 21: actual time=106.038..6238.629 rows=187 loops=1

Buffers: shared hit=582724

Worker 22: actual time=54.568..6488.311 rows=183 loops=1

Buffers: shared hit=613989

Worker 23: actual time=24.021..5368.295 rows=152 loops=1

Buffers: shared hit=488385

Worker 24: actual time=53.327..5658.396 rows=178 loops=1

Buffers: shared hit=515591

Worker 25: actual time=23.201..5358.615 rows=142 loops=1

Buffers: shared hit=483975

Worker 26: actual time=109.940..5560.662 rows=163 loops=1

Buffers: shared hit=505844

 Planning time: 0.081 ms

 Execution time: 7637.509 ms

(69 rows)

Time: 7638.100 ms

postgres=# select tag from t_tags where get_bit(userids,10000)=1;

tag

--------

Cancel request sent

ERROR:canceling statement due to user request

Time: 17.521 ms

profile，get_bit的瓶頸在memcpy，這個可以通過優化PG核心改進。

3647.00 34.2% memcpy/lib64/libc-2.12.so

分區

大APPID，按USER號段切分

APPID+號段分區

如果一個APPID 1萬個TAG，1億用戶，只佔用120GB。

通常是出現傾斜時才需要重散發。

PostgreSQL使用postgres_fdw可以原生支援資料分區。

參考我之前寫的文件

《PostgreSQL9.6 單元化,sharding (based on postgres_fdw) - 核心層支援前傳》

《PostgreSQL9.6 sharding + 單元化 (based on postgres_fdw) 最佳實踐 - 通用水準分庫場景設計與實踐》

同一個APPID的一批TAG必須在一個節點

擴充閱讀

機器學習服務，生成TAG，本文不涉及。

參考 MADlib、R ，群集PostgreSQL都可以非常好的支援。

Count

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

基於阿裡雲 RDS PostgreSQL打造即時用戶畫像推薦系統

幕後

商務場景模擬

一、寬表方案剖析

二、陣列方案剖析

三、位元位方案1剖析

四、位元位方案2剖析

五、位元位方案2實施細節

阿裡雲RDSPostgreSQL 新增的bit動作函數

表結構設計

測試TAG群組查詢效能

進階功能

測試新增資料的效能

測試資料的合併效能

更新字典封裝為函數如下

資料合併封裝為函數如下

查詢單個用戶有哪些TAG

分區

擴充閱讀

聯繫我們

熱門文章

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

基於阿裡雲 RDS PostgreSQL打造即時用戶畫像推薦系統

幕後

商務場景模擬

一、寬表方案剖析

二、陣列方案剖析

三、位元位方案1剖析

四、位元位方案2剖析

五、位元位方案2實施細節

阿裡雲RDSPostgreSQL 新增的bit動作函數

表結構設計

測試TAG群組查詢效能

進階功能

測試新增資料的效能

測試資料的合併效能

更新字典封裝為函數如下

資料合併封裝為函數如下

查詢單個用戶有哪些TAG

分區

擴充閱讀

聯繫我們

熱門文章

推薦話題

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support