Introduction
First twoArticleIn this article, we will mainly introduce the use of Aggregate functions and Data grouping.
Introduction
Simply put, an aggregate function uses multiple rows of data according to certain rules.Summarized into one row. Before summarizing the data, you can group and summarize the data based on the specified column, and then filter the data based on the given conditions ).
Aggregate functionsSummaryCan be explained as follows:
Simple Aggregate functions
Simple Aggregate functions are functions that intuitively aggregate multiple rows into one row calculation rule. These functions can often guess the functions from the function name itself, and the parameters of these functions are numerical. Simple Aggregate functions include AVG, sum, Max, and Min.
The parameters of a simple aggregate function can only beNumberType. In SQL, the numeric types include tinyint, smallint, Int, bigint, decimal, money, smallmoney, float, and real.
Before introducing a simple aggregate function, we will first introduce the aggregate function count.
Count ()
The count function is used to calculate the number of rows contained in a given condition. For example, the simplest:
In the above table, I want to know the number of employees in the company, which can be used simply:
Select count (*) as employeenumberfrom HumanResources. Employee
The result is as follows:
When count () acts on a specific column, the difference is that when count (column name) encounters a "null" value, it will not be included, for example:
I want to know the number of employees with superiors in the company:
Select count (managerid) as employeewithsuperiorfrom HumanResources. Employee
We can see that all employees except the superiors are counted.
It can also be used in count ().DistinctKeyword to include only one of the same values in each column, for example:
I want to count the number of managers in the company:
Select count (distinct managerid) as numberofmanagerfrom HumanResources. Employee
The result is as follows.
AVG (), sum (), max () and min ()
In addition to different functions, these Aggregate functions have almost the same parameters and usage. So here we will only explain the aggregate function AVG:
AVG () indicates calculating the average value of the summary data within the selected range. During this process, the "null" value is not counted, for example:
I want to get the average length of vacation for each employee:
Select AVG (vacationhours)'Average vacation hours'From HumanResources. Employee
The result is as follows:
Because the aggregate function is used by default for data aggregation, it does not contain null, but if I want to include a null value, andIn the current queryUse isnull (column, value)
For example:
I want to obtain the average length of vacation for each employee. If the employee does not take a vacation, it is calculated as 10 hours on leave.
Select AVG (isnull (vacationhours, 10)'Average vacation hours'From HumanResources. Employee
The result is as follows:
You can also use the distinct keyword to make each value unique in a simple aggregate function to participate in the aggregation summary operation. It has been explained in the count function above and will not be repeated here.
And aboutSum (), max (), min ()The usage of these simple Aggregate functions is basically the same and will not be repeated here.
Group the values obtained by the aggregate function by Column
If the aggregate function results cannot be grouped by specific values, then the aggregate function is less powerful. In SQL, useGroupGroup the Aggregate functions. The concept of grouping can be expressed in the following simple example:
For example:
I want to get the total sales by sales personnel in different provinces:
Select territoryid, sum (saleslastyear) as totalsalesfrom sales. salespersongroup by territoryid
Shows the concept:
FollowGroupThe column names are the basis for grouping. Of course, in some cases, grouping is performed based on multiple columns. The following example has some practical significance:
I want to get the total sick leave time for employees of different managers based on different gender:
Select managerid, gender, sum (sickleavehours) as sickleavehours, count (*) as employeenumberfrom HumanResources. employeegroup by gender, managerid
The result is as follows:
The group by clause is followed by multiple columns. In logic thinking, we can perform this operation based on the unique managerid and unique gender of each column.Cross join(If you do not understand cross join, please refer to my previous article) Obtain a unique key that can be used to determine other keys, and finally filter out rows that cannot be returned in aggregate functions (ROW) (that is, null) rows. Based on the actual two columnsLogicThe above is the value of a column as the basis for grouping.
We can see that we first group according to the manager ID, then according to the gender of the employees under Different managers, and then divide the total again, and finally get the sum of the sick leave time according to the grouping conditions.
Note that when usingGroupWhen grouping by multiple columns, be sure to appear inGroupSubsequent order
In the preceding example, gender first traverses all possible values of gender, computes matching managerid Based on the possible values of each gender, and then performs aggregate function operations based on managerid, if you change the column order after group by to managerid, and then gender, it means that you traverse all possible values of managerid and then match gender. The result is as follows:
Starting from gender (gender) to M (male), the second traversal of managerid for matching:
We can see from the above that, although the order of columns (columns) behind the group by clause is different and the order of the results is different, the obtained dataset (Dataset) is exactly the same, therefore, the order by clause can be used to query the group by statement in different column order to obtain the same results. Here it is no longer.
Filter the dataset After grouping (having) again)
After grouping with Aggregate functions, you can useGroupClauseHavingClause filters the grouped data again.HavingClause is similarWhereFor more information about how to use the having expression, see where in my previous article. The having clause can be understood as a statement that performs Secondary Filtering After grouping.
Having clauses are simple, but note that having clauses cannot be followed by aliases in select statements. Instead, you must write the expressions in select statements again, for example, for the above table:
I want to get the total sick leave time of employees under Different managers according to different gender. The employees under these managers need more than two persons:
Select managerid, gender, sum (sickleavehours) as sickleavehours, count (*) as employeenumberfrom HumanResources. employeegroup by managerid, genderhaving (employeenumber> 2)
Note:The preceding statement is incorrect. aliases or variable names cannot be referenced after the having clause. To achieve the preceding effect, you need to count (*) this expression is rewritten in the having clause. The correct syntax is as follows:
Select managerid, gender, sum (sickleavehours) as sickleavehours, count (*) as employeenumberfrom HumanResources. employeegroup by managerid, genderhaving (count (*)> 2)
The result is as follows:
We can see that only the condition with more than two employees is selected.
Of course, the having clauseThe most powerful thing is that it can use aggregate functions as expressions., Which is not allowed in the WHERE clause. The following example demonstrates the power of the having clause:
Or the data in the example above:
I want to get the total number of sick leave times for employees under Different managers, and the number of sick leave hours for employees under this manager is twice the number of sick leave employees:
Select managerid, sum (sickleavehours) as hour, count (*) as employeenumberfrom HumanResources. employeegroup by manageridhaving (max (sickleavehours)> 2 * min (minutes ))
The result is as follows:
It can be seen that having clauses can implement powerful functions in such a simple way, and it will be very troublesome to use where clause. In the above result, the scope of the aggregate function of the having statement can be well demonstrated:
As shown in the preceding figure, the number of hours for the most frequently asked employees on leave is significantly greater than twice the number of hours for the minimum number of employees on leave.
Summary
Starting with the concept of Aggregate functions, this article describes the frequently-used queries, grouping, and filtering methods used by Aggregate functions. Good use of Aggregate functions can put many into the applicationProgramThe tasks at the business layer are transferred to the database. This will be of great help for maintenance and performance improvement.
PS: the SQL query entry is complete. Although the content in this article is quite simple, many vague concepts become clearer during my writing process. So as to re-learn yourself. I also try to express every concept in a simple way. The technical article should be like this:-) Prepare for the subsequent articles ....