Recently encountered a test demand, the use of online analytical processing (OLAP), is summarized in the record, but also hope to help people with related problems.
1. The test environment is DB2, through ETL (data extraction, data transformation, data loading) technology to load the data source data into the target data warehouse.
2. Requirements approximate meaning: Verify that the order containing the product encoding is loaded from the data source into the target database.
3. The target data is derived from 7 different application databases, each of which stores the corresponding order for the product code group, where 50 of the product-related orders need to be loaded into the target warehouse.
The method we get tested after analyzing the requirements:
1. Query the sample order data in the source database, obtain all of them contained in 50 product-coded orders, each product code 2 data for verification.
2. Through regular SQL, we classify orders according to commodity code, we can get the max function to get the most recently created orders, but this method can only get one piece of data per product, if you need to get 2, 10 data? SQL is hard to find. Now we can introduce OLAP functions to achieve business goals simply and efficiently.
1 SELECT * from2(SELECT 3 DISTINCT RTRIM(a.record_id),RTRIM(a.po_id),RTRIM(a.ant_id), B.cat_id,b.extract_dt,row_number () Over(PARTITION byb.cat_idORDER byB.extract_dtDESC) RN4 5 from 6 --Retreve POs for each cat_id for the last year from BDW7(SELECTrecord_id,po_id,cat_id,extract_dt,ant_id fromTeame. Po_item8 WHEREcat_idinch('4q6','4w8','S86','S89','QU39','u4q0','UQ41','UQ43','U89','W24','YQ44','QY45','QY50','y5q1','E0w4', 9 'W72','8q3','0W3','Q75',' the','P74',' the','P76','77E','P78','E03','E05','E06','E07','ED8'Ten,'WW9','E37','WW0','DD3','DS3','E65','7S4',' $','CA1','0QS4','W31',' -','9a4','Y95','QY96') One --and date (EXTRACT_DT) >= date (current_date-365 days) and date (EXTRACT_DT) <= date (current_date) A ) B, - ip.com C, - Teame. Po_ia Poia, the Teame. PO A - - Left OUTER JOINTeame. P_g_m D - on +d.record_id=a.record_id and -d.prchorg_id=a.prchorg_id and +d.prchgrp_id=a.prchgrp_id and Ad.prchmem_uniq_id=a.prchmem_uniq_id at - WHERE -a.record_id=b.record_id and -a.po_id=b.po_id and -a.ant_id=b.ant_id and -a.record_id=Poia. record_id and in -a.po_id=Poia. po_id and tob.cat_id=C.corpcommcode and +(Comgroupinch('J','D') - ORPoia. Ledgacct_minor_numinch('123','422','1',' +','324','123','442','123','FDF','FD')) the anda.record_id> ' ' * andDATE (A.EXTRACT_DT)>=DATE (current_date - 365Days) andDATE (A.EXTRACT_DT)<=DATE (current_date)) RN $ WHERERn=1Panax Notoginseng withUR;
Next we mainly see this sentence: Row_number () over(PARTITION by b.cat_id ORDER by b.extract_dt DESC) RN
Row_number () This function is used to numbers the query result set,
Over is an expression that defines a scope (or, optionally, a result set) that functions only on the result set defined by over.
PARTITION by is used to group the result set, as with group by.
ORDER by pairs a group of subgroups sorted by a column.
Finally, with the conditions where rn=2 Gets the first two lines of each group after buying a group.
In addition to this function, we can expand and understand the following other common functions:
Rank () is a jumping sort, with two second names followed by fourth (also within each grouping).
Dense_rank () L is a sequential sort, with two second names still followed by third place.
In contrast, row_number is a sequential number (the only one within a group) that has no duplicate values, sorted within each set.
First_value, which is used to find the minimum value of the over definition set
Last_value, which is used to find the maximum value of the over definition collection. It is worth noting that these two functions have a parameter, ' IGNORE NULLS ' or ' RESPECT NULLS ', to ignore null values and consider null values
OLAP all other functions:
Row_number
RANK
Dense_rank
First_value
Last_value
LAG
Lead
COUNT
MIN
MAX
Avg
SUM
Row_number
RANK
Dense_rank
First_value
Last_value
LAG
Lead
COUNT
MIN
MAX
Avg
SUM
On on-Line Analytical processing function of DB2