Sort out the window functions of PostgreSQL

Source: Internet
Author: User

PostgreSQL has added some Window Function functions in Versions later than 8.4. The following is a brief introduction.

A window function performs a calculation setting ss a set of table rows that are somehow related to the current row. this is comparable to the type of calculation that can be done with an aggregate function. but unlike regular aggregate fungate, use of a window function does not cause rows to become grouped into a single output row-the rows retain their separate identities. behind the scenes, the window function is able to access more than just the current row of the query result.
Window Functions in SQL is an OLAP functionality that provides ranking, cumulative computation, and partitioning aggregation. many commercial RDMBS such like Oracle, ms SQL Server and DB2 have implemented part of this specification, while open source RDMBS including PostgreSQL, MySQL and Firebird doesn't yet. to implement this functionality on PostgreSQL not only helps extends users move from those RDBMS to PostgreSQL but encourages OLAP applications such as BI (Business Inteligence) to analyze large data set. this specification is defined first in SQL: 2003, and improved in SQL: 2008
In short, the aggregate function returns the results of each group. The window function returns the results for each row, as shown in the following example:
1. Create a sample table and initialize the data.

DROP TABLE IF EXISTS empsalary;CREATE TABLE empsalary(  depname varchar,  empno bigint,  salary int,  enroll_date date);INSERT INTO empsalary VALUES('develop',10, 5200, '2007/08/01');INSERT INTO empsalary VALUES('sales', 1, 5000, '2006/10/01');INSERT INTO empsalary VALUES('personnel', 5, 3500, '2007/12/10');INSERT INTO empsalary VALUES('sales', 4, 4800, '2007/08/08');INSERT INTO empsalary VALUES('sales', 6, 5500, '2007/01/02');INSERT INTO empsalary VALUES('personnel', 2, 3900, '2006/12/23');INSERT INTO empsalary VALUES('develop', 7, 4200, '2008/01/01');INSERT INTO empsalary VALUES('develop', 9, 4500, '2008/01/01');INSERT INTO empsalary VALUES('sales', 3, 4800, '2007/08/01');INSERT INTO empsalary VALUES('develop', 8, 6000, '2006/10/01');INSERT INTO empsalary VALUES('develop', 11, 5200, '2007/08/15');postgres=# select * from empsalary ;  depname  | empno | salary | enroll_date -----------+-------+--------+------------- develop   |    10 |   5200 | 2007-08-01 sales     |     1 |   5000 | 2006-10-01 personnel |     5 |   3500 | 2007-12-10 sales     |     4 |   4800 | 2007-08-08 sales     |     6 |   5500 | 2007-01-02 personnel |     2 |   3900 | 2006-12-23 develop   |     7 |   4200 | 2008-01-01 develop   |     9 |   4500 | 2008-01-01 sales     |     3 |   4800 | 2007-08-01 develop   |     8 |   6000 | 2006-10-01 develop   |    11 |   5200 | 2007-08-15(11 rows)
2. Statistical Example
A. collect statistics on the total salary, average salary, and department details of each department
postgres=# select sum(salary) OVER (PARTITION BY depname),avg(salary) OVER (PARTITION BY depname),* from empsalary;  sum  |          avg          |  depname  | empno | salary | enroll_date -------+-----------------------+-----------+-------+--------+------------- 25100 | 5020.0000000000000000 | develop   |    10 |   5200 | 2007-08-01 25100 | 5020.0000000000000000 | develop   |     7 |   4200 | 2008-01-01 25100 | 5020.0000000000000000 | develop   |     9 |   4500 | 2008-01-01 25100 | 5020.0000000000000000 | develop   |     8 |   6000 | 2006-10-01 25100 | 5020.0000000000000000 | develop   |    11 |   5200 | 2007-08-15  7400 | 3700.0000000000000000 | personnel |     2 |   3900 | 2006-12-23  7400 | 3700.0000000000000000 | personnel |     5 |   3500 | 2007-12-10 20100 | 5025.0000000000000000 | sales     |     3 |   4800 | 2007-08-01 20100 | 5025.0000000000000000 | sales     |     1 |   5000 | 2006-10-01 20100 | 5025.0000000000000000 | sales     |     4 |   4800 | 2007-08-08 20100 | 5025.0000000000000000 | sales     |     6 |   5500 | 2007-01-02(11 rows)
B. Salaries of statisticians in their respective departments
postgres=# select rank() OVER (PARTITION BY depname ORDER BY salary),* from empsalary; rank |  depname  | empno | salary | enroll_date ------+-----------+-------+--------+-------------    1 | develop   |     7 |   4200 | 2008-01-01    2 | develop   |     9 |   4500 | 2008-01-01    3 | develop   |    10 |   5200 | 2007-08-01    3 | develop   |    11 |   5200 | 2007-08-15    5 | develop   |     8 |   6000 | 2006-10-01    1 | personnel |     5 |   3500 | 2007-12-10    2 | personnel |     2 |   3900 | 2006-12-23    1 | sales     |     4 |   4800 | 2007-08-08    1 | sales     |     3 |   4800 | 2007-08-01    3 | sales     |     1 |   5000 | 2006-10-01    4 | sales     |     6 |   5500 | 2007-01-02(11 rows)
3. For an interesting example, pay attention to order by, and the result will be two.
 create table foo(a int,b int) ;insert into foo values (1,1);insert into foo values (1,1);insert into foo values (2,1);insert into foo values (4,1);insert into foo values (2,1);insert into foo values (4,1);insert into foo values (5,1);insert into foo values (11,3);insert into foo values (12,3);insert into foo values (22,3);insert into foo values (16,3);insert into foo values (16,3);insert into foo values (16,3);postgres=# select sum(a) over (partition by b), a, b from foo; sum | a  | b -----+----+---  19 |  1 | 1  19 |  1 | 1  19 |  2 | 1  19 |  4 | 1  19 |  2 | 1  19 |  4 | 1  19 |  5 | 1  93 | 11 | 3  93 | 12 | 3  93 | 22 | 3  93 | 16 | 3  93 | 16 | 3  93 | 16 | 3(13 rows)postgres=# select sum(a) over (partition by b order by a), a, b from foo; sum | a  | b -----+----+---   2 |  1 | 1   2 |  1 | 1   6 |  2 | 1   6 |  2 | 1  14 |  4 | 1  14 |  4 | 1  19 |  5 | 1  11 | 11 | 3  23 | 12 | 3  71 | 16 | 3  71 | 16 | 3  71 | 16 | 3  93 | 22 | 3(13 rows)postgres=# select a, b, sum(a) over (partition by b order by a ROWS postgres(# BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) from foo; a  | b | sum ----+---+-----  1 | 1 |  19  1 | 1 |  19  2 | 1 |  19  2 | 1 |  19  4 | 1 |  19  4 | 1 |  19  5 | 1 |  19 11 | 3 |  93 12 | 3 |  93 16 | 3 |  93 16 | 3 |  93 16 | 3 |  93 22 | 3 |  93(13 rows)
The explanation on the official website is: By default, if order by is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the order by clause. when order by is omitted the default frame consists of all rows in the partition.
By default, the order by parameter is superimposed from the start value of the Group until the current value is reached. If the order by parameter is ignored, the sum of all values in the group is calculated.

4. other window functions
Row_number (): Starting from the current, uninterrupted, such as 1, 2, 3, 4, 5, 6
Rank (): It is interrupted from the current start, for example, 1, 2, 4, 5, 6.
Dense_rank (): It is not interrupted from the current start, but will be repeated, such
Percent_rank (): calculates the percentage of data in a group from the current time, for example, 0, 0.25, 0.25, 0.75, and 1. Continuously loops from 0 to 1.
Cume_dist (): sorts the current row by the number of groups. If there are four rows in the group, the value is 0.25, 0.5, 0.75, 1
Ntile (num_buckets integer): from 1 to the current value, divided by the number of groups, try to make the distribution average
Lag (value any [, offset integer [, default any]): return the offset function. Take the lag value. For example, lag (column_name, 2, 0) indicates that the field offset is 2, if no value exists, use the default value instead. Here, the value is 0. If no value is specified, the default value is null.
Lead (value any [, offset integer [, default any]): offset function, which obtains the advance value. first_value (value any) in the class: returns the first value in the window framework.
Last_value (value any): returns the last value in the window framework.
Nth_value (value any, nth integer): return the specified value in the window framework, for example, nth_value (salary, 2), which indicates that the second window function value of the salary field is returned.

5. other window function examples
postgres=# select row_number() over (partition by depname order by salary desc),* from empsalary; row_number |  depname  | empno | salary | enroll_date ------------+-----------+-------+--------+-------------          1 | develop   |     8 |   6000 | 2006-10-01          2 | develop   |    10 |   5200 | 2007-08-01          3 | develop   |    11 |   5200 | 2007-08-15          4 | develop   |     9 |   4500 | 2008-01-01          5 | develop   |     7 |   4200 | 2008-01-01          1 | personnel |     2 |   3900 | 2006-12-23          2 | personnel |     5 |   3500 | 2007-12-10          1 | sales     |     6 |   5500 | 2007-01-02          2 | sales     |     1 |   5000 | 2006-10-01          3 | sales     |     3 |   4800 | 2007-08-01          4 | sales     |     4 |   4800 | 2007-08-08(11 rows)postgres=# select rank() over(partition by depname order by salary desc),* from empsalary; rank |  depname  | empno | salary | enroll_date ------+-----------+-------+--------+-------------    1 | develop   |     8 |   6000 | 2006-10-01    2 | develop   |    10 |   5200 | 2007-08-01    2 | develop   |    11 |   5200 | 2007-08-15    4 | develop   |     9 |   4500 | 2008-01-01    5 | develop   |     7 |   4200 | 2008-01-01    1 | personnel |     2 |   3900 | 2006-12-23    2 | personnel |     5 |   3500 | 2007-12-10    1 | sales     |     6 |   5500 | 2007-01-02    2 | sales     |     1 |   5000 | 2006-10-01    3 | sales     |     3 |   4800 | 2007-08-01    3 | sales     |     4 |   4800 | 2007-08-08(11 rows)postgres=# select dense_rank() over(partition by depname order by salary desc),* from empsalary; dense_rank |  depname  | empno | salary | enroll_date ------------+-----------+-------+--------+-------------          1 | develop   |     8 |   6000 | 2006-10-01          2 | develop   |    10 |   5200 | 2007-08-01          2 | develop   |    11 |   5200 | 2007-08-15          3 | develop   |     9 |   4500 | 2008-01-01          4 | develop   |     7 |   4200 | 2008-01-01          1 | personnel |     2 |   3900 | 2006-12-23          2 | personnel |     5 |   3500 | 2007-12-10          1 | sales     |     6 |   5500 | 2007-01-02          2 | sales     |     1 |   5000 | 2006-10-01          3 | sales     |     3 |   4800 | 2007-08-01          3 | sales     |     4 |   4800 | 2007-08-08(11 rows)postgres=# select percent_rank() over(partition by depname order by salary desc),* from empsalary;   percent_rank    |  depname  | empno | salary | enroll_date -------------------+-----------+-------+--------+-------------                 0 | develop   |     8 |   6000 | 2006-10-01              0.25 | develop   |    10 |   5200 | 2007-08-01              0.25 | develop   |    11 |   5200 | 2007-08-15              0.75 | develop   |     9 |   4500 | 2008-01-01                 1 | develop   |     7 |   4200 | 2008-01-01                 0 | personnel |     2 |   3900 | 2006-12-23                 1 | personnel |     5 |   3500 | 2007-12-10                 0 | sales     |     6 |   5500 | 2007-01-02 0.333333333333333 | sales     |     1 |   5000 | 2006-10-01 0.666666666666667 | sales     |     3 |   4800 | 2007-08-01 0.666666666666667 | sales     |     4 |   4800 | 2007-08-08(11 rows)postgres=# select cume_dist()over(partition by depname order by salary desc),* from empsalary; cume_dist |  depname  | empno | salary | enroll_date -----------+-----------+-------+--------+-------------       0.2 | develop   |     8 |   6000 | 2006-10-01       0.6 | develop   |    10 |   5200 | 2007-08-01       0.6 | develop   |    11 |   5200 | 2007-08-15       0.8 | develop   |     9 |   4500 | 2008-01-01         1 | develop   |     7 |   4200 | 2008-01-01       0.5 | personnel |     2 |   3900 | 2006-12-23         1 | personnel |     5 |   3500 | 2007-12-10      0.25 | sales     |     6 |   5500 | 2007-01-02       0.5 | sales     |     1 |   5000 | 2006-10-01         1 | sales     |     3 |   4800 | 2007-08-01         1 | sales     |     4 |   4800 | 2007-08-08(11 rows)postgres=# select ntile(3)over(partition by depname order by salary desc),* from empsalary; ntile |  depname  | empno | salary | enroll_date -------+-----------+-------+--------+-------------     1 | develop   |     8 |   6000 | 2006-10-01     1 | develop   |    10 |   5200 | 2007-08-01     2 | develop   |    11 |   5200 | 2007-08-15     2 | develop   |     9 |   4500 | 2008-01-01     3 | develop   |     7 |   4200 | 2008-01-01     1 | personnel |     2 |   3900 | 2006-12-23     2 | personnel |     5 |   3500 | 2007-12-10     1 | sales     |     6 |   5500 | 2007-01-02     1 | sales     |     1 |   5000 | 2006-10-01     2 | sales     |     3 |   4800 | 2007-08-01     3 | sales     |     4 |   4800 | 2007-08-08(11 rows)postgres=# select lag(salary,2,null)over(partition by depname order by salary desc),* from empsalary; lag  |  depname  | empno | salary | enroll_date ------+-----------+-------+--------+-------------      | develop   |     8 |   6000 | 2006-10-01      | develop   |    10 |   5200 | 2007-08-01 6000 | develop   |    11 |   5200 | 2007-08-15 5200 | develop   |     9 |   4500 | 2008-01-01 5200 | develop   |     7 |   4200 | 2008-01-01      | personnel |     2 |   3900 | 2006-12-23      | personnel |     5 |   3500 | 2007-12-10      | sales     |     6 |   5500 | 2007-01-02      | sales     |     1 |   5000 | 2006-10-01 5500 | sales     |     3 |   4800 | 2007-08-01 5000 | sales     |     4 |   4800 | 2007-08-08(11 rows)postgres=# select first_value(salary)over(partition by depname order by salary desc),* from empsalary; first_value |  depname  | empno | salary | enroll_date -------------+-----------+-------+--------+-------------        6000 | develop   |     8 |   6000 | 2006-10-01        6000 | develop   |    10 |   5200 | 2007-08-01        6000 | develop   |    11 |   5200 | 2007-08-15        6000 | develop   |     9 |   4500 | 2008-01-01        6000 | develop   |     7 |   4200 | 2008-01-01        3900 | personnel |     2 |   3900 | 2006-12-23        3900 | personnel |     5 |   3500 | 2007-12-10        5500 | sales     |     6 |   5500 | 2007-01-02        5500 | sales     |     1 |   5000 | 2006-10-01        5500 | sales     |     3 |   4800 | 2007-08-01        5500 | sales     |     4 |   4800 | 2007-08-08(11 rows)postgres=# select last_value(salary)over(partition by depname order by salary desc),* from empsalary; last_value |  depname  | empno | salary | enroll_date ------------+-----------+-------+--------+-------------       6000 | develop   |     8 |   6000 | 2006-10-01       5200 | develop   |    10 |   5200 | 2007-08-01       5200 | develop   |    11 |   5200 | 2007-08-15       4500 | develop   |     9 |   4500 | 2008-01-01       4200 | develop   |     7 |   4200 | 2008-01-01       3900 | personnel |     2 |   3900 | 2006-12-23       3500 | personnel |     5 |   3500 | 2007-12-10       5500 | sales     |     6 |   5500 | 2007-01-02       5000 | sales     |     1 |   5000 | 2006-10-01       4800 | sales     |     3 |   4800 | 2007-08-01       4800 | sales     |     4 |   4800 | 2007-08-08(11 rows)postgres=# select nth_value(salary,2)over(partition by depname order by salary desc),* from empsalary; nth_value |  depname  | empno | salary | enroll_date -----------+-----------+-------+--------+-------------           | develop   |     8 |   6000 | 2006-10-01      5200 | develop   |    10 |   5200 | 2007-08-01      5200 | develop   |    11 |   5200 | 2007-08-15      5200 | develop   |     9 |   4500 | 2008-01-01      5200 | develop   |     7 |   4200 | 2008-01-01           | personnel |     2 |   3900 | 2006-12-23      3500 | personnel |     5 |   3500 | 2007-12-10           | sales     |     6 |   5500 | 2007-01-02      5000 | sales     |     1 |   5000 | 2006-10-01      5000 | sales     |     3 |   4800 | 2007-08-01      5000 | sales     |     4 |   4800 | 2007-08-08(11 rows)
When a query involves multiple window functions, you can use the alias method, which is simpler:
postgres=# select sum(salary)over w,avg(salary) over w,* from empsalary window w as (partition by depname order by salary desc);  sum  |          avg          |  depname  | empno | salary | enroll_date -------+-----------------------+-----------+-------+--------+-------------  6000 | 6000.0000000000000000 | develop   |     8 |   6000 | 2006-10-01 16400 | 5466.6666666666666667 | develop   |    10 |   5200 | 2007-08-01 16400 | 5466.6666666666666667 | develop   |    11 |   5200 | 2007-08-15 20900 | 5225.0000000000000000 | develop   |     9 |   4500 | 2008-01-01 25100 | 5020.0000000000000000 | develop   |     7 |   4200 | 2008-01-01  3900 | 3900.0000000000000000 | personnel |     2 |   3900 | 2006-12-23  7400 | 3700.0000000000000000 | personnel |     5 |   3500 | 2007-12-10  5500 | 5500.0000000000000000 | sales     |     6 |   5500 | 2007-01-02 10500 | 5250.0000000000000000 | sales     |     1 |   5000 | 2006-10-01 20100 | 5025.0000000000000000 | sales     |     3 |   4800 | 2007-08-01 20100 | 5025.0000000000000000 | sales     |     4 |   4800 | 2007-08-08(11 rows)
This write method is the same as the following, but it is simpler.
SELECT sum (salary) OVER (partition by depname order by salary DESC), avg (salary) OVER (partition by depname order by salary DESC), * FROM empsalary;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.