What is a window function?
Window functions and Aggregate functions aggregate defined row sets (groups), but return only one value unlike aggregation. Window functions can return multiple values for each group, the row group for clustering execution is a window.(This is called the 'window function'). A window function adds over () to the aggregate function. All Aggregate functions can be converted to window functions in this way.. The text above seems a little abstract. For example
Figure 1 Figure 2 Figure 3
Figure 1 shows the data of the original data table. SQL statement: Select * From citytable
Figure 2 shows the number of records in the original data table. SQL statement: Select count (*) as 'number of records 'from citytable
Figure 3 shows how to obtain the number of data table records based on the window function count (*) over (). SQL statement: Select provincename, cityname, count (*) over () as 'number of records 'from citytable.
According to the comparison above, the aggregate function count (*) can only get one row of record (2). This record contains the clustered result 6, and the window function count (*) is used (*) the records obtained by over () are not only one row (as shown in 3), but multiple rows, and each row contains clustering results. This is also the biggest difference between aggregate functions and window functions.
Of course, other clauses can be accepted in () after the over keyword to change the row range of the window function. If there is no clause in over () brackets, the default window function acts on the entire result set.
The following describesPartition clause
[Partition by] partition clause: You can define row partitions or groups based on the partition by clause to complete clustering. If empty parentheses are used, the entire result set is a partition, the window function will perform aggregation Calculation on it. We can regard partition by as a moving group, you can use partition by to calculate and aggregate the defined row group (reset when a new group is encountered) and return each value (members in each group ), instead of using a group to represent all instances with this value in the table.
For example, to partition the result set shown in Figure 3 by province, the SQL statement is as follows:
Select provincename, cityname,
Count (*) over (partition by provincename) as 'number of records'
From citytable
The above involves a function prtition by provincename, which means to partition the entire result set, divide different partitions according to different provinces, and display the number of rows in each partition in the result set, when a new partition is encountered, it is re-computed. As shown in figure 4 after execution
Figure 4
Window functions are executed at the end, before order by, after where and group.For example, the SQL statement of result set in Figure 4 is used, and a condition is added.
Where (provincename = 'fujian 'and cityname <> 'nanping') or (provincename = 'hunan 'and cityname <> 'changsha') order by 'number of records'
The final SQL statement is as follows:
Select provincename, cityname,
Count (*) over (partition by provincename) as 'number of records'
From citytable
Where (provincename = 'fujian 'and cityname <> 'nanping') or (provincename = 'hunan 'and cityname <> 'changsha ')
Order by 'number of records'
The preceding SQL statement first obtains the qualified records based on the where statement, and then executes the window function to count (the number of records corresponding to Hunan is 1, instead of 2 shown in Figure 4, the number of records corresponding to Fujian is 2, which is not 3 shown in Figure 4). Sort the records based on the order by statement. The result 5 is displayed.
Figure 5
The difference between count (*) and count (column)
Count (*) calculates the number of rows, while count (column) calculates the number of records with non-null column values, that is, count (column) ignores NULL values, when the null value is also a valid value, we can only use count (*) to calculate the number of rows for aggregation.