Use of DB2 OLAP Functions

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

--- Start

DB2 online analysis and processing can be described very well. This function is especially suitable for various statistical queries. These queries are difficult to implement with common SQL statements, or are not implemented at all. First, let's start with a simple example to unveil its mystery step by step. Please refer to the following SQL:

Select
Row_number () over (order by salary) as number,
Name as name,
Dept as department,
Salary as salary
From
(
-- Name department salary
Values
('Zhang san', 'marketing Department ', 4000 ),
('Zhao hong', 'Technology Department ', 2000 ),
('Li si', 'marketing Department ', 5000 ),
('Li Bai ', 'Technology Department', 5000 ),
('Wang 5', 'marketing Department ', null ),
('Wang Lan', 'Technology Department ', 4000)
) As employ (name, DEPT, salary );
The query result is as follows:
Serial number name department salary
1 Zhao Hong Technology Department 2000
2. Zhang San marketing department 4000
3 Wang Lan Technology Department 4000
4 Li Si marketing department 5000
5 Li Bai Technical Department 5000
6. Wang Wu Marketing Department (null)

Have you seen row_number () over () above? Many people do not understand this. How can we write two functions like this? Some people even doubt whether the preceding SQL statement can be executed. In fact, row_number is a function. Its function can be identified by its name, that is, the number of the query result set. However, over is not a function, but an expression. It defines a scope (or result set). The function above over takes effect only for the result set defined by over. Why? It doesn't matter. We will introduce it in detail later.

From the preceding SQL, we can see that the typical DB2 online analysis processing format includes two parts:Function SectionAndOver Expression. What functions can be used in the function section? As follows:

Row_number
Rank
Dense_rank
First_value
Last_value
Lag
Lead
Count
Min
Max
AVG
Sum

The functions of the above functions will be introduced later. You can guess the functions of the functions based on the function name.

Suppose I want to query the average salary of Department employees and the average salary of all employees without changing the query results of the preceding statement. What should I do? It is difficult to query using common SQL statements, but using OLAP functions is very simple, as shown in the following SQL:

Select
Row_number () over () as serial number,
Row_number () over (partition by dept order by salary) as department number,
Name as name,
Dept as department,
Salary as salary,
AVG (salary) over (partition by dept) as average salary,
AVG (salary) over () as average full salary
From
(
-- Name department salary
Values
('Zhang san', 'marketing Department ', 4000 ),
('Zhao hong', 'Technology Department ', 2000 ),
('Li si', 'marketing Department ', 5000 ),
('Li Bai ', 'Technology Department', 5000 ),
('Wang 5', 'marketing Department ', null ),
('Wang Lan', 'Technology Department ', 4000)
) As employ (name, DEPT, salary );
The query result is as follows:
Department No. Department No. Name Department Wage Department average salary of all employees
1 Zhang San marketing department 4000 4500 4000
2 2. Li Si marketing department 5000 4500 4000
3 3 Wang Wu Marketing Department (null) 4500 4000
4 1 Zhao Hong Technology Department 2000 3666 4000
5 2. Wang Lan Technology Department 4000 3666 4000
3 Li Bai Technology Department 5000 3666 4000

Note the difference between the sequence number and the Department sequence number. When querying the Department sequence number, we add two more clauses in the over expression, which arePartitionAndOrder. What are their functions? Before introducing their functions, let's review the role of over. Do you remember?

Over is an expression that defines a scope (or result set). The function above over only applies to the result set defined by over.

OrderYou should be familiar with the function of sorting result sets.PartitionThe role is actually very simple, andGroupIs used to group result sets.

So far, you should have some understanding and experience on the routines of OLAP functions. Let's take a look at the SQL result set above and find that Wang Wu's salary is null. When we sort by salary, null is placed at the end. What should we do if we want to put null in the front? UseNulls firstKeyword. The default value isNulls last,See the following SQL:

Select
Row_number () over (order by salary DESC nulls first) as RN,
Rank () over (order by salary DESC nulls first) as rk,
Dense_rank () over (order by salary DESC nulls first) as d_rk,
Name as name,
Dept as department,
Salary as salary
From
(
-- Name department salary
Values
('Zhang san', 'marketing Department ', 4000 ),
('Zhao hong', 'Technology Department ', 2000 ),
('Li si', 'marketing Department ', 5000 ),
('Li Bai ', 'Technology Department', 5000 ),
('Wang 5', 'marketing Department ', null ),
('Wang Lan', 'Technology Department ', 4000)
) As employ (name, DEPT, salary );
The query result is as follows:
Rn rk d_rk name department salary
1 1 1 Wang Wu Marketing Department (null)
2 2 2 Li Si marketing department 5000
3 2 Li Bai Technology Department 5000
4 4 3 Zhang San marketing department 4000
5 4 3 Wang Lan Technology Department 4000
6 6 4 Zhao Hong Technology Department 2000

Pay attention to the difference between row_number and rank. rank indicates rank, which means that Li Si and Li Bai both pay 5000 yuan. They are tied for the second place. Zhang Sanhe and Wang Lan both have a salary of 4000. How does rank function rank fourth, while dense_rank is the third? This is the difference between the two functions. Because there are two second names, the rank function does not have a third name by default.

Now there is another new question. Suppose you want to check the salary of each employee and the average salary of every employee whose salary is smaller than the average salary of all of his employees, what should you do? Why? Did not understand the problem? Please refer to the following SQL:

Select
Name as name,
Salary as salary,
Sum (salary) over (order by salary nulls first rows between unbounded preceding and current row) as less than my total salary,
Sum (salary) over (order by salary nulls first rows between current row and unbounded following) as is greater than my total salary,
Sum (salary) over (order by salary nulls first rows between unbounded preceding and unbounded following) as total salary 1,
Sum (salary) over () as total salary 2
From
(
-- Name department salary
Values
('Zhang san', 'marketing Department ', 4000 ),
('Zhao hong', 'Technology Department ', 2000 ),
('Li si', 'marketing Department ', 5000 ),
('Li Bai ', 'Technology Department', 5000 ),
('Wang 5', 'marketing Department ', null ),
('Wang Lan', 'Technology Department ', 4000)
) As employ (name, DEPT, salary );
The query result is as follows:
Name: the salary is smaller than my total salary. It is greater than my total salary. total salary. 1 total salary. 2
King Five (null) 20000 20000 20000
Zhao Hong 2000 2000 20000 20000 20000
Zhang San 4000 6000 18000 20000 20000
Wang Lan 4000 10000 14000 20000 20000
Li Si 5000 15000 10000 20000 20000
Li Bai 5000 20000 5000 20000 20000

The over section in the preceding SQL contains a rows clause. Let's take a look at the structure of the rows clause:

Rows between <upper limit condition> and <lower limit condition>
The "upper limit condition" can be the following keywords:
Unbounded preceding
<Number> preceding
Current row
The "deprecation condition" can be the following keywords:
Current row
<Number> following
Unbounded following

Note that the above keywords are relative to the current row,Unbounded precedingIndicates all rows before the current row, that is, there is no upper limit;<Number>PrecedingIndicates starting from the current row to<Number>For example, number = 2 indicates the first two rows of the current row;Current rowIndicates the current row. As for the other two keywords, I think, you should know it as well. If you still do not understand, analyze the preceding SQL query results carefully.

An over expression can also have a clause, that isRange, Its usage andRowsIt is very similar, or identical, and has a similar effect, but it is a little different, as shown below:

Range<Upper limit condition>And<Minimum Condition>

The<Upper limit condition>,<Lower limit condition> andRowsThe following SQL statements demonstrate the differences between them:

Select
Name as name,
Dept as department,
Salary as salary,
First_value (salary, 'ignore nulls') over (partition by dept) as department minimum wage,
Last_value (salary, 'respect nulls') over (partition by dept) as department highest salary,
Sum (salary) over (order by salary rows between 1 preceding and 1 following) as rows,
Sum (salary) over (order by salary range between 500 preceding and 500 following) as range
From
(
-- Name department salary
Values
('Zhang san', 'marketing Department ', 2000 ),
('Zhao hong', 'Technology Department ', 2400 ),
('Li si', 'marketing Department ', 3000 ),
('Li Bai ', 'Technology Department', 3200 ),
('Wang 5', 'marketing Department ', 4000 ),
('Wang Lan', 'Technology Department ', 5000)
) As employ (name, DEPT, salary );
The query result is as follows:
Name Department Wage Department Minimum Wage Department maximum wage rows range
Zhang San marketing department 2000 2000 4000 4400 4400
Zhao Hong Technology Department 2400 2400 5000 7400 4400
Li Si marketing department 3000 2000 4000 8600 6200
Li Bai Technology Department 3200 2400 5000 10200 6200
Wang Wu marketing department 4000 2000 4000 12200 4000
Wang Lan Technology Department 5000 2400 5000 9000 5000

The preceding SQLRangeThe clause defines a wage range. The upper limit is the current row's salary-500, and the lower limit is the current row's salary + 500. For example, if Li Si's salary is 3000, the upper limit is 3000-500 = 2500, and the lower limit is 3000 + 500 = 3500, then who's salary is in the range? Only Li Si and Li Bai, so the value of the range column is 3000 (Li Si) + 3200 (Li Bai) = 6200. The above is the difference between rows and range.

The preceding SQL statement is also used.First_value and last_valueThe functions of these two functions are also very simple. They are used to calculate the minimum and maximum values of over defined sets. It is worth noting that these two functions have a parameter,'Ignore nulls' or 'Respect nulls'They are used to ignore null values and consider null values, just like their names.

We have not introduced two other functions,Lag and lead,These two functions are very powerful. See the following SQL:

Select
Name as name,
Salary as salary,
Lag (salary, 0) over (order by salary) as lag0,
Lag (salary) over (order by salary) as lag1,
Lag (salary, 2) over (order by salary) as lag2,
Lag (salary, 3, 0, 'ignore nulls') over (order by salary) as lag3,
Lag (salary, 4,-1, 'respect nulls') over (order by salary) as lag4,
Lead (salary) over (order by salary) as lead
From
(
-- Name department salary
Values
('Zhang san', 'marketing Department ', 2000 ),
('Zhao hong', 'Technology Department ', 2400 ),
('Li si', 'marketing Department ', 3000 ),
('Li Bai ', 'Technology Department', 3200 ),
('Wang 5', 'marketing Department ', 4000 ),
('Wang Lan', 'Technology Department ', 5000)
) As employ (name, DEPT, salary );
The query result is as follows:
Name salary lag0 lag1 lag2 lag3 lag4 lead
Zhang San 2000 2000 (null) 0-1 2400
Zhao Hong 2400 2400 2000 (null) 0-1 3000
Li Si 3000 3000 2400 2000 0-1 3200
Li Bai 3200 3200 3000 2400-1 2000
Wang v 4000 4000 3200 3000 2400 2000 5000
Wang Lan 5000 5000 4000 3200 3000 2400 (null)

Let's take a look at the Declaration of the lag and lead functions as follows:

Lag (expression or field, offset, default value, ignore nulls or respect nulls)

Lag Is the downward offset, and lead is the offset. Let's take a look at the query results of the preceding SQL statement.

So far, all the knowledge about DB2 OLAP functions has been introduced to you. Next we will review the components of DB2 online analysis and processing, as shown below:

Function over (partition by clause order by clause rows or range clause)

It takes some time and practice to be familiar with this knowledge. Once you have mastered this knowledge, you will have a peerless martial art that can be used to understand DB2.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use of DB2 OLAP Functions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use of DB2 OLAP Functions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support