Use of DB2 OLAP Functions

Source: Internet
Author: User

--- Start

DB2 online analysis and processing can be described very well. This function is especially suitable for various statistical queries. These queries are difficult to implement with common SQL statements, or are not implemented at all. First, let's start with a simple example to unveil its mystery step by step. Please refer to the following SQL:

  1. Select
  2. Row_number () over (order by salary) as number,
  3. Name as name,
  4. Dept as department,
  5. Salary as salary
  6. From
  7. (
  8. -- Name department salary
  9. Values
  10. ('Zhang san', 'marketing Department ', 4000 ),
  11. ('Zhao hong', 'Technology Department ', 2000 ),
  12. ('Li si', 'marketing Department ', 5000 ),
  13. ('Li Bai ', 'Technology Department', 5000 ),
  14. ('Wang 5', 'marketing Department ', null ),
  15. ('Wang Lan', 'Technology Department ', 4000)
  16. ) As employ (name, DEPT, salary );
  17. The query result is as follows:
  18. Serial number name department salary
  19. 1 Zhao Hong Technology Department 2000
  20. 2. Zhang San marketing department 4000
  21. 3 Wang Lan Technology Department 4000
  22. 4 Li Si marketing department 5000
  23. 5 Li Bai Technical Department 5000
  24. 6. Wang Wu Marketing Department (null)

 

Have you seen row_number () over () above? Many people do not understand this. How can we write two functions like this? Some people even doubt whether the preceding SQL statement can be executed. In fact, row_number is a function. Its function can be identified by its name, that is, the number of the query result set. However, over is not a function, but an expression. It defines a scope (or result set). The function above over takes effect only for the result set defined by over. Why? It doesn't matter. We will introduce it in detail later.

 

From the preceding SQL, we can see that the typical DB2 online analysis processing format includes two parts:Function SectionAndOver Expression. What functions can be used in the function section? As follows:

  1. Row_number
  2. Rank
  3. Dense_rank
  4. First_value
  5. Last_value
  6. Lag
  7. Lead
  8. Count
  9. Min
  10. Max
  11. AVG
  12. Sum

 

The functions of the above functions will be introduced later. You can guess the functions of the functions based on the function name.

 

Suppose I want to query the average salary of Department employees and the average salary of all employees without changing the query results of the preceding statement. What should I do? It is difficult to query using common SQL statements, but using OLAP functions is very simple, as shown in the following SQL:

  1. Select
  2. Row_number () over () as serial number,
  3. Row_number () over (partition by dept order by salary) as department number,
  4. Name as name,
  5. Dept as department,
  6. Salary as salary,
  7. AVG (salary) over (partition by dept) as average salary,
  8. AVG (salary) over () as average full salary
  9. From
  10. (
  11. -- Name department salary
  12. Values
  13. ('Zhang san', 'marketing Department ', 4000 ),
  14. ('Zhao hong', 'Technology Department ', 2000 ),
  15. ('Li si', 'marketing Department ', 5000 ),
  16. ('Li Bai ', 'Technology Department', 5000 ),
  17. ('Wang 5', 'marketing Department ', null ),
  18. ('Wang Lan', 'Technology Department ', 4000)
  19. ) As employ (name, DEPT, salary );
  20. The query result is as follows:
  21. Department No. Department No. Name Department Wage Department average salary of all employees
  22. 1 Zhang San marketing department 4000 4500 4000
  23. 2 2. Li Si marketing department 5000 4500 4000
  24. 3 3 Wang Wu Marketing Department (null) 4500 4000
  25. 4 1 Zhao Hong Technology Department 2000 3666 4000
  26. 5 2. Wang Lan Technology Department 4000 3666 4000
  27. 3 Li Bai Technology Department 5000 3666 4000

 

Note the difference between the sequence number and the Department sequence number. When querying the Department sequence number, we add two more clauses in the over expression, which arePartitionAndOrder. What are their functions? Before introducing their functions, let's review the role of over. Do you remember?

 

Over is an expression that defines a scope (or result set). The function above over only applies to the result set defined by over.

 

OrderYou should be familiar with the function of sorting result sets.PartitionThe role is actually very simple, andGroupIs used to group result sets.

 

So far, you should have some understanding and experience on the routines of OLAP functions. Let's take a look at the SQL result set above and find that Wang Wu's salary is null. When we sort by salary, null is placed at the end. What should we do if we want to put null in the front? UseNulls firstKeyword. The default value isNulls last,See the following SQL:

  1. Select
  2. Row_number () over (order by salary DESC nulls first) as RN,
  3. Rank () over (order by salary DESC nulls first) as rk,
  4. Dense_rank () over (order by salary DESC nulls first) as d_rk,
  5. Name as name,
  6. Dept as department,
  7. Salary as salary
  8. From
  9. (
  10. -- Name department salary
  11. Values
  12. ('Zhang san', 'marketing Department ', 4000 ),
  13. ('Zhao hong', 'Technology Department ', 2000 ),
  14. ('Li si', 'marketing Department ', 5000 ),
  15. ('Li Bai ', 'Technology Department', 5000 ),
  16. ('Wang 5', 'marketing Department ', null ),
  17. ('Wang Lan', 'Technology Department ', 4000)
  18. ) As employ (name, DEPT, salary );
  19. The query result is as follows:
  20. Rn rk d_rk name department salary
  21. 1 1 1 Wang Wu Marketing Department (null)
  22. 2 2 2 Li Si marketing department 5000
  23. 3 2 Li Bai Technology Department 5000
  24. 4 4 3 Zhang San marketing department 4000
  25. 5 4 3 Wang Lan Technology Department 4000
  26. 6 6 4 Zhao Hong Technology Department 2000

 

Pay attention to the difference between row_number and rank. rank indicates rank, which means that Li Si and Li Bai both pay 5000 yuan. They are tied for the second place. Zhang Sanhe and Wang Lan both have a salary of 4000. How does rank function rank fourth, while dense_rank is the third? This is the difference between the two functions. Because there are two second names, the rank function does not have a third name by default.

 

Now there is another new question. Suppose you want to check the salary of each employee and the average salary of every employee whose salary is smaller than the average salary of all of his employees, what should you do? Why? Did not understand the problem? Please refer to the following SQL:

  1. Select
  2. Name as name,
  3. Salary as salary,
  4. Sum (salary) over (order by salary nulls first rows between unbounded preceding and current row) as less than my total salary,
  5. Sum (salary) over (order by salary nulls first rows between current row and unbounded following) as is greater than my total salary,
  6. Sum (salary) over (order by salary nulls first rows between unbounded preceding and unbounded following) as total salary 1,
  7. Sum (salary) over () as total salary 2
  8. From
  9. (
  10. -- Name department salary
  11. Values
  12. ('Zhang san', 'marketing Department ', 4000 ),
  13. ('Zhao hong', 'Technology Department ', 2000 ),
  14. ('Li si', 'marketing Department ', 5000 ),
  15. ('Li Bai ', 'Technology Department', 5000 ),
  16. ('Wang 5', 'marketing Department ', null ),
  17. ('Wang Lan', 'Technology Department ', 4000)
  18. ) As employ (name, DEPT, salary );
  19. The query result is as follows:
  20. Name: the salary is smaller than my total salary. It is greater than my total salary. total salary. 1 total salary. 2
  21. King Five (null) 20000 20000 20000
  22. Zhao Hong 2000 2000 20000 20000 20000
  23. Zhang San 4000 6000 18000 20000 20000
  24. Wang Lan 4000 10000 14000 20000 20000
  25. Li Si 5000 15000 10000 20000 20000
  26. Li Bai 5000 20000 5000 20000 20000

 

The over section in the preceding SQL contains a rows clause. Let's take a look at the structure of the rows clause:

  1. Rows between <upper limit condition> and <lower limit condition>
  2. The "upper limit condition" can be the following keywords:
  3. Unbounded preceding
  4. <Number> preceding
  5. Current row
  6. The "deprecation condition" can be the following keywords:
  7. Current row
  8. <Number> following
  9. Unbounded following

 

Note that the above keywords are relative to the current row,Unbounded precedingIndicates all rows before the current row, that is, there is no upper limit;<Number>PrecedingIndicates starting from the current row to<Number>For example, number = 2 indicates the first two rows of the current row;Current rowIndicates the current row. As for the other two keywords, I think, you should know it as well. If you still do not understand, analyze the preceding SQL query results carefully.

 

An over expression can also have a clause, that isRange, Its usage andRowsIt is very similar, or identical, and has a similar effect, but it is a little different, as shown below:

 

Range<Upper limit condition>And<Minimum Condition>

 

The<Upper limit condition>,<Lower limit condition> andRowsThe following SQL statements demonstrate the differences between them:

  1. Select
  2. Name as name,
  3. Dept as department,
  4. Salary as salary,
  5. First_value (salary, 'ignore nulls') over (partition by dept) as department minimum wage,
  6. Last_value (salary, 'respect nulls') over (partition by dept) as department highest salary,
  7. Sum (salary) over (order by salary rows between 1 preceding and 1 following) as rows,
  8. Sum (salary) over (order by salary range between 500 preceding and 500 following) as range
  9. From
  10. (
  11. -- Name department salary
  12. Values
  13. ('Zhang san', 'marketing Department ', 2000 ),
  14. ('Zhao hong', 'Technology Department ', 2400 ),
  15. ('Li si', 'marketing Department ', 3000 ),
  16. ('Li Bai ', 'Technology Department', 3200 ),
  17. ('Wang 5', 'marketing Department ', 4000 ),
  18. ('Wang Lan', 'Technology Department ', 5000)
  19. ) As employ (name, DEPT, salary );
  20. The query result is as follows:
  21. Name Department Wage Department Minimum Wage Department maximum wage rows range
  22. Zhang San marketing department 2000 2000 4000 4400 4400
  23. Zhao Hong Technology Department 2400 2400 5000 7400 4400
  24. Li Si marketing department 3000 2000 4000 8600 6200
  25. Li Bai Technology Department 3200 2400 5000 10200 6200
  26. Wang Wu marketing department 4000 2000 4000 12200 4000
  27. Wang Lan Technology Department 5000 2400 5000 9000 5000

 

The preceding SQLRangeThe clause defines a wage range. The upper limit is the current row's salary-500, and the lower limit is the current row's salary + 500. For example, if Li Si's salary is 3000, the upper limit is 3000-500 = 2500, and the lower limit is 3000 + 500 = 3500, then who's salary is in the range? Only Li Si and Li Bai, so the value of the range column is 3000 (Li Si) + 3200 (Li Bai) = 6200. The above is the difference between rows and range.

 

The preceding SQL statement is also used.First_value and last_valueThe functions of these two functions are also very simple. They are used to calculate the minimum and maximum values of over defined sets. It is worth noting that these two functions have a parameter,'Ignore nulls' or 'Respect nulls'They are used to ignore null values and consider null values, just like their names.

 

We have not introduced two other functions,Lag and lead,These two functions are very powerful. See the following SQL:

  1. Select
  2. Name as name,
  3. Salary as salary,
  4. Lag (salary, 0) over (order by salary) as lag0,
  5. Lag (salary) over (order by salary) as lag1,
  6. Lag (salary, 2) over (order by salary) as lag2,
  7. Lag (salary, 3, 0, 'ignore nulls') over (order by salary) as lag3,
  8. Lag (salary, 4,-1, 'respect nulls') over (order by salary) as lag4,
  9. Lead (salary) over (order by salary) as lead
  10. From
  11. (
  12. -- Name department salary
  13. Values
  14. ('Zhang san', 'marketing Department ', 2000 ),
  15. ('Zhao hong', 'Technology Department ', 2400 ),
  16. ('Li si', 'marketing Department ', 3000 ),
  17. ('Li Bai ', 'Technology Department', 3200 ),
  18. ('Wang 5', 'marketing Department ', 4000 ),
  19. ('Wang Lan', 'Technology Department ', 5000)
  20. ) As employ (name, DEPT, salary );
  21. The query result is as follows:
  22. Name salary lag0 lag1 lag2 lag3 lag4 lead
  23. Zhang San 2000 2000 (null) 0-1 2400
  24. Zhao Hong 2400 2400 2000 (null) 0-1 3000
  25. Li Si 3000 3000 2400 2000 0-1 3200
  26. Li Bai 3200 3200 3000 2400-1 2000
  27. Wang v 4000 4000 3200 3000 2400 2000 5000
  28. Wang Lan 5000 5000 4000 3200 3000 2400 (null)

 

Let's take a look at the Declaration of the lag and lead functions as follows:

 

Lag (expression or field, offset, default value, ignore nulls or respect nulls)

 

Lag Is the downward offset, and lead is the offset. Let's take a look at the query results of the preceding SQL statement.

 

So far, all the knowledge about DB2 OLAP functions has been introduced to you. Next we will review the components of DB2 online analysis and processing, as shown below:

 

Function over (partition by clause order by clause rows or range clause)

 

It takes some time and practice to be familiar with this knowledge. Once you have mastered this knowledge, you will have a peerless martial art that can be used to understand DB2.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.