Sometimes this problem occurs. We need to query a table and sort it by business. For example, I need the following results:
Region date fee product no. User No.
290 201202 258 1 s1
290 201202 200 1 s5
290 201202 100 1
290 201202 90 2 s7
290 201202 88 2 s9
290 201202 10 2 s12.
The leader asked me to generate a report and we need to see the top three customers who charge for each business. In this case, using rank () over (partition) is a good choice.
My test table is like the table in the example above, but the data is a little more. Here is an example:
We can see that each of my projects has five records. I only take the first three records, and the SQL statement is as follows:
SELECT A.AREA_ID, A.ACCT_MONTH, A.FEE, A.ITEM_ID, A.USER_ID FROM (SELECT T.AREA_ID, T.ACCT_MONTH, T.FEE, T.ITEM_ID, T.USER_ID, RANK() OVER(PARTITION BY T.ITEM_ID ORDER BY T.FEE DESC) RK FROM TEST T) A WHERE RK < 4;
The execution result of this statement is the above situation.
If you don't need to continue, check the execution plan of the statement.
First, write my table creation statement:
CREATE TABLE TEST( area_id NUMBER, acct_month NUMBER, fee NUMBER, item_id NUMBER)PARTITION BY LIST(area_id)( PARTITION part_290 VALUES('290'), PARTITION part_910 VALUES('910'), PARTITION part_911 VALUES('911'), partition part_912 values('912'), partition part_913 values('913'), partition part_914 values('914'), partition part_915 values('915'), partition part_916 values('916'), partition part_917 values('917'), partition part_919 values('919'), partition part_default values(default))
I partition by region. In fact, I can partition by time. Explain the plan:
We can see that there are only 10 records in the table, but the COST is as high as 4. We have to say that this method will greatly reduce the query efficiency. However, rank () over is a good tool for business purposes.
Add two pieces of data to test whether the partition table can improve the efficiency without indexing. Two pieces of data are added. The area_id is 911, and the district code of Baoji City, Shaanxi Province is 0911. The statement is also slightly changed:
SELECT A.AREA_ID, A.ACCT_MONTH, A.FEE, A.ITEM_ID, A.USER_ID FROM (SELECT T.AREA_ID, T.ACCT_MONTH, T.FEE, T.ITEM_ID, T.USER_ID, RANK() OVER(PARTITION BY T.ITEM_ID ORDER BY T.FEE DESC) RK FROM TEST T WHERE t.area_id = 290) A WHERE RK < 4;
Explain the plan:
It is found that although there are 12 rows of data, but because I only query Xi'an (290), there are still 10 rows in rows, and there are no newly added Baoji 2 rows. It can be seen that it is necessary to create partitions based on certain conditions in the face of massive data storage. However, it may be a little difficult to create partitions by time, because the time is constantly advancing. after how many years, the partitions you have created will no longer be available, so you need to add new partitions, this is also a very annoying thing.
By wingsless