--- Start
DB2 online analysis and processing can be described very well. This function is especially suitable for various statistical queries. These queries are difficult to implement with common SQL statements, or are not implemented at all. First, let's start with a simple example to unveil its mystery step by step. Please refer to the following SQL:
- Select
- Row_number () over (order by salary) as number,
- Name as name,
- Dept as department,
- Salary as salary
- From
- (
- -- Name department salary
- Values
- ('Zhang san', 'marketing Department ', 4000 ),
- ('Zhao hong', 'Technology Department ', 2000 ),
- ('Li si', 'marketing Department ', 5000 ),
- ('Li Bai ', 'Technology Department', 5000 ),
- ('Wang 5', 'marketing Department ', null ),
- ('Wang Lan', 'Technology Department ', 4000)
- ) As employ (name, DEPT, salary );
- The query result is as follows:
- Serial number name department salary
- 1 Zhao Hong Technology Department 2000
- 2. Zhang San marketing department 4000
- 3 Wang Lan Technology Department 4000
- 4 Li Si marketing department 5000
- 5 Li Bai Technical Department 5000
- 6. Wang Wu Marketing Department (null)
Have you seen row_number () over () above? Many people do not understand this. How can we write two functions like this? Some people even doubt whether the preceding SQL statement can be executed. In fact, row_number is a function. Its function can be identified by its name, that is, the number of the query result set. However, over is not a function, but an expression. It defines a scope (or result set). The function above over takes effect only for the result set defined by over. Why? It doesn't matter. We will introduce it in detail later.
From the preceding SQL, we can see that the typical DB2 online analysis processing format includes two parts:Function SectionAndOver Expression. What functions can be used in the function section? As follows:
- Row_number
- Rank
- Dense_rank
- First_value
- Last_value
- Lag
- Lead
- Count
- Min
- Max
- AVG
- Sum
The functions of the above functions will be introduced later. You can guess the functions of the functions based on the function name.
Suppose I want to query the average salary of Department employees and the average salary of all employees without changing the query results of the preceding statement. What should I do? It is difficult to query using common SQL statements, but using OLAP functions is very simple, as shown in the following SQL:
- Select
- Row_number () over () as serial number,
- Row_number () over (partition by dept order by salary) as department number,
- Name as name,
- Dept as department,
- Salary as salary,
- AVG (salary) over (partition by dept) as average salary,
- AVG (salary) over () as average full salary
- From
- (
- -- Name department salary
- Values
- ('Zhang san', 'marketing Department ', 4000 ),
- ('Zhao hong', 'Technology Department ', 2000 ),
- ('Li si', 'marketing Department ', 5000 ),
- ('Li Bai ', 'Technology Department', 5000 ),
- ('Wang 5', 'marketing Department ', null ),
- ('Wang Lan', 'Technology Department ', 4000)
- ) As employ (name, DEPT, salary );
- The query result is as follows:
- Department No. Department No. Name Department Wage Department average salary of all employees
- 1 Zhang San marketing department 4000 4500 4000
- 2 2. Li Si marketing department 5000 4500 4000
- 3 3 Wang Wu Marketing Department (null) 4500 4000
- 4 1 Zhao Hong Technology Department 2000 3666 4000
- 5 2. Wang Lan Technology Department 4000 3666 4000
- 3 Li Bai Technology Department 5000 3666 4000
Note the difference between the sequence number and the Department sequence number. When querying the Department sequence number, we add two more clauses in the over expression, which arePartitionAndOrder. What are their functions? Before introducing their functions, let's review the role of over. Do you remember?
Over is an expression that defines a scope (or result set). The function above over only applies to the result set defined by over.
OrderYou should be familiar with the function of sorting result sets.PartitionThe role is actually very simple, andGroupIs used to group result sets.
So far, you should have some understanding and experience on the routines of OLAP functions. Let's take a look at the SQL result set above and find that Wang Wu's salary is null. When we sort by salary, null is placed at the end. What should we do if we want to put null in the front? UseNulls firstKeyword. The default value isNulls last,See the following SQL:
- Select
- Row_number () over (order by salary DESC nulls first) as RN,
- Rank () over (order by salary DESC nulls first) as rk,
- Dense_rank () over (order by salary DESC nulls first) as d_rk,
- Name as name,
- Dept as department,
- Salary as salary
- From
- (
- -- Name department salary
- Values
- ('Zhang san', 'marketing Department ', 4000 ),
- ('Zhao hong', 'Technology Department ', 2000 ),
- ('Li si', 'marketing Department ', 5000 ),
- ('Li Bai ', 'Technology Department', 5000 ),
- ('Wang 5', 'marketing Department ', null ),
- ('Wang Lan', 'Technology Department ', 4000)
- ) As employ (name, DEPT, salary );
- The query result is as follows:
- Rn rk d_rk name department salary
- 1 1 1 Wang Wu Marketing Department (null)
- 2 2 2 Li Si marketing department 5000
- 3 2 Li Bai Technology Department 5000
- 4 4 3 Zhang San marketing department 4000
- 5 4 3 Wang Lan Technology Department 4000
- 6 6 4 Zhao Hong Technology Department 2000
Pay attention to the difference between row_number and rank. rank indicates rank, which means that Li Si and Li Bai both pay 5000 yuan. They are tied for the second place. Zhang Sanhe and Wang Lan both have a salary of 4000. How does rank function rank fourth, while dense_rank is the third? This is the difference between the two functions. Because there are two second names, the rank function does not have a third name by default.
Now there is another new question. Suppose you want to check the salary of each employee and the average salary of every employee whose salary is smaller than the average salary of all of his employees, what should you do? Why? Did not understand the problem? Please refer to the following SQL:
- Select
- Name as name,
- Salary as salary,
- Sum (salary) over (order by salary nulls first rows between unbounded preceding and current row) as less than my total salary,
- Sum (salary) over (order by salary nulls first rows between current row and unbounded following) as is greater than my total salary,
- Sum (salary) over (order by salary nulls first rows between unbounded preceding and unbounded following) as total salary 1,
- Sum (salary) over () as total salary 2
- From
- (
- -- Name department salary
- Values
- ('Zhang san', 'marketing Department ', 4000 ),
- ('Zhao hong', 'Technology Department ', 2000 ),
- ('Li si', 'marketing Department ', 5000 ),
- ('Li Bai ', 'Technology Department', 5000 ),
- ('Wang 5', 'marketing Department ', null ),
- ('Wang Lan', 'Technology Department ', 4000)
- ) As employ (name, DEPT, salary );
- The query result is as follows:
- Name: the salary is smaller than my total salary. It is greater than my total salary. total salary. 1 total salary. 2
- King Five (null) 20000 20000 20000
- Zhao Hong 2000 2000 20000 20000 20000
- Zhang San 4000 6000 18000 20000 20000
- Wang Lan 4000 10000 14000 20000 20000
- Li Si 5000 15000 10000 20000 20000
- Li Bai 5000 20000 5000 20000 20000
The over section in the preceding SQL contains a rows clause. Let's take a look at the structure of the rows clause:
- Rows between <upper limit condition> and <lower limit condition>
- The "upper limit condition" can be the following keywords:
- Unbounded preceding
- <Number> preceding
- Current row
- The "deprecation condition" can be the following keywords:
- Current row
- <Number> following
- Unbounded following
Note that the above keywords are relative to the current row,Unbounded precedingIndicates all rows before the current row, that is, there is no upper limit;<Number>PrecedingIndicates starting from the current row to<Number>For example, number = 2 indicates the first two rows of the current row;Current rowIndicates the current row. As for the other two keywords, I think, you should know it as well. If you still do not understand, analyze the preceding SQL query results carefully.
An over expression can also have a clause, that isRange, Its usage andRowsIt is very similar, or identical, and has a similar effect, but it is a little different, as shown below:
Range<Upper limit condition>And<Minimum Condition>
The<Upper limit condition>,<Lower limit condition> andRowsThe following SQL statements demonstrate the differences between them:
- Select
- Name as name,
- Dept as department,
- Salary as salary,
- First_value (salary, 'ignore nulls') over (partition by dept) as department minimum wage,
- Last_value (salary, 'respect nulls') over (partition by dept) as department highest salary,
- Sum (salary) over (order by salary rows between 1 preceding and 1 following) as rows,
- Sum (salary) over (order by salary range between 500 preceding and 500 following) as range
- From
- (
- -- Name department salary
- Values
- ('Zhang san', 'marketing Department ', 2000 ),
- ('Zhao hong', 'Technology Department ', 2400 ),
- ('Li si', 'marketing Department ', 3000 ),
- ('Li Bai ', 'Technology Department', 3200 ),
- ('Wang 5', 'marketing Department ', 4000 ),
- ('Wang Lan', 'Technology Department ', 5000)
- ) As employ (name, DEPT, salary );
- The query result is as follows:
- Name Department Wage Department Minimum Wage Department maximum wage rows range
- Zhang San marketing department 2000 2000 4000 4400 4400
- Zhao Hong Technology Department 2400 2400 5000 7400 4400
- Li Si marketing department 3000 2000 4000 8600 6200
- Li Bai Technology Department 3200 2400 5000 10200 6200
- Wang Wu marketing department 4000 2000 4000 12200 4000
- Wang Lan Technology Department 5000 2400 5000 9000 5000
The preceding SQLRangeThe clause defines a wage range. The upper limit is the current row's salary-500, and the lower limit is the current row's salary + 500. For example, if Li Si's salary is 3000, the upper limit is 3000-500 = 2500, and the lower limit is 3000 + 500 = 3500, then who's salary is in the range? Only Li Si and Li Bai, so the value of the range column is 3000 (Li Si) + 3200 (Li Bai) = 6200. The above is the difference between rows and range.
The preceding SQL statement is also used.First_value and last_valueThe functions of these two functions are also very simple. They are used to calculate the minimum and maximum values of over defined sets. It is worth noting that these two functions have a parameter,'Ignore nulls' or 'Respect nulls'They are used to ignore null values and consider null values, just like their names.
We have not introduced two other functions,Lag and lead,These two functions are very powerful. See the following SQL:
- Select
- Name as name,
- Salary as salary,
- Lag (salary, 0) over (order by salary) as lag0,
- Lag (salary) over (order by salary) as lag1,
- Lag (salary, 2) over (order by salary) as lag2,
- Lag (salary, 3, 0, 'ignore nulls') over (order by salary) as lag3,
- Lag (salary, 4,-1, 'respect nulls') over (order by salary) as lag4,
- Lead (salary) over (order by salary) as lead
- From
- (
- -- Name department salary
- Values
- ('Zhang san', 'marketing Department ', 2000 ),
- ('Zhao hong', 'Technology Department ', 2400 ),
- ('Li si', 'marketing Department ', 3000 ),
- ('Li Bai ', 'Technology Department', 3200 ),
- ('Wang 5', 'marketing Department ', 4000 ),
- ('Wang Lan', 'Technology Department ', 5000)
- ) As employ (name, DEPT, salary );
- The query result is as follows:
- Name salary lag0 lag1 lag2 lag3 lag4 lead
- Zhang San 2000 2000 (null) 0-1 2400
- Zhao Hong 2400 2400 2000 (null) 0-1 3000
- Li Si 3000 3000 2400 2000 0-1 3200
- Li Bai 3200 3200 3000 2400-1 2000
- Wang v 4000 4000 3200 3000 2400 2000 5000
- Wang Lan 5000 5000 4000 3200 3000 2400 (null)
Let's take a look at the Declaration of the lag and lead functions as follows:
Lag (expression or field, offset, default value, ignore nulls or respect nulls)
Lag Is the downward offset, and lead is the offset. Let's take a look at the query results of the preceding SQL statement.
So far, all the knowledge about DB2 OLAP functions has been introduced to you. Next we will review the components of DB2 online analysis and processing, as shown below:
Function over (partition by clause order by clause rows or range clause)
It takes some time and practice to be familiar with this knowledge. Once you have mastered this knowledge, you will have a peerless martial art that can be used to understand DB2.