SQL entry-level data query tutorial

Source: Internet
Author: User

In the first two articles, the concept of a single table query and a multiple table query is described in detail, and the use of aggregate functions and the grouping of data are introduced in this article.


Simply put, an aggregate function is a function that aggregates multirow (row) data into one line according to certain rules. After you summarize the data, you can group (group by) the other columns that are summarized according to a specific column (column), and you can filter (having) the conditions that you have given again.

The concept of aggregating multiple rows of data by aggregate functions can be simply explained in the following illustration:




Simple aggregate functions are those that have the intuitive function of summarizing multirow (row) to a row (row) calculation rule. These functions tend to guess the function of functions from the function name itself, and the parameters of these functions are numeric types. Simple aggregate functions include: Avg,sum,max,min.
The parameters of a simple aggregate function can only be numeric types, in SQL, the numeric type specifically includes: Tinyint,smallint,int,bigint,decimal,money,smallmoney,float,real.

Before introducing a simple aggregate function, let's introduce the aggregate function of count ().


The Count function is used to calculate the number of rows (row) contained under a given condition. For example, the simplest:

In the table above, I want to know the number of employees in the company, can be used simply:

SELECT COUNT (*) as EmployeeNumber
From HumanResources.Employee


The results are as follows

When count () is applied to a particular column (columns), and when "*" is used as an argument, the difference is that when count (column name) encounters a "Null" value, it is not counted, for example:
I would like to know the number of employees in the company with superior:

SELECT COUNT (ManagerID) as Employeewithsuperior
From HumanResources.Employee




As you can see, all other employees have been counted except for the CEO who has no superiors.

You can also use the DISTINCT keyword within count () to allow each of the same values for each column (column) to be counted, for example:
I want to count the number of managers in the company:

SELECT COUNT (DISTINCT managerid) as Numberofmanager
From HumanResources.Employee


The result is as above.


These aggregate functions have almost the same parameters and usages in addition to different functions. So here's an explanation only of the AVG () aggregate function:
AVG () represents the average of the summarized data that is calculated in the selection. The "Null" value in this procedure is not counted, for example:
I want to get the average duration for every employee on vacation:

SELECT AVG (vacationhours) as ' Average vacation hours '
From HumanResources.Employee

The results are as follows:

Because the default is to summarize data with aggregate functions, it does not contain NULL, but if I want to include null values and replace the null values with other values in the current query and participate in the rollup operations, use ISNULL (Column,value)
For example:
I want to get the average length of vacation per employee, and if the employee does not have a vacation, 10 hours on leave

SELECT AVG (ISNULL (vacationhours)) as ' Average vacation hours '
From HumanResources.Employee

The results are as follows:

You can also use the DISTINCT keyword to make each value unique in a simple aggregate function to participate in aggregation rollup operations. In the Count function above, there is no repetition here.
The simple aggregate functions of sum (), Max (), Min () and so on are basically the same, and there is no repetition here.



If the results of an aggregate function cannot be grouped by a specific value, then the function of the aggregate function is not so strong. In SQL, use Group by to group values that are summarized by aggregate functions. The concept of grouping can be shown in the following simple example:
For example:
I want to get the total sales of salespeople according to different provinces:

SELECT TerritoryID, SUM (saleslastyear) as TotalSales
From Sales.SalesPerson
GROUP by TerritoryID

The concept is shown in the following illustration:


The column names that follow the group by are the basis for grouping. In some cases, of course, there are cases in which groups are grouped according to multiple columns (column). The following example has a practical meaning:
I want to get the total sick time of a different manager's staff according to the gender:


SELECT ManagerID, Gender, SUM (sickleavehours) as Sickleavehours, COUNT (*) as EmployeeNumber
From HumanResources.Employee
GROUP by Gender, ManagerID


The results are as follows:

Group by back multiple columns, we can logically think so, based on each column only ManagerID and unique gender for cross Join (if you do not know what cross join, please see my previous article) to get the only key to determine the other key (key) , and finally filters out rows (that is, null) in the aggregate function that cannot return a value. Again based on this actually two columns, but logically is a column of values as the group basis.
As you can see in the picture above, we first group by the manager ID, then divide the total again according to the gender of the employee under the different manager, and finally get the total sick time according to the group condition.
Note that when you group by using GROUP by according to Dole (Column), be sure to pay attention to the order that appears after group by
First appear gender is first to traverse the gender of all possible values, and then according to each gender possible value to compute the matching managerid, and finally based on the ManagerID to do the aggregate function operations, if the above group by the column after (column) The order is changed to first ManagerID, then gender, which means traversing all possible values of managerid and then matching gender, the result is as follows:

From gender (gender) into m (male), the second traversal ManagerID to match:


As we can see from the above, although the order in which columns appear after group by is different, the resulting dataset is exactly the same, so you can use the ORDER BY clause to group in different columns The same result is obtained by the query statement by. This is no longer a screenshot.


Once you have grouped using aggregate functions, you can use the group The HAVING clause after the BY clause filters the grouped data again. The HAVING clause is much like the WHERE clause in some respects, and the use of the specific having expression can be seen in the previous article where I explained it. The HAVING clause can be understood as a statement that is filtered two times after grouping.
Using the HAVING clause is very simple, but it is important to note that the HAVING clause cannot be followed by an alias that appears in the SELECT statement, but the expression within the SELECT statement must be written again, for example, or against the table above:
I want to sum up the sick time of the employees of different managers under different genders, and the employees of these managers need more than 2 people:

SELECT ManagerID, Gender, SUM (sickleavehours) as Sickleavehours, COUNT (*) as EmployeeNumber
From HumanResources.Employee
GROUP by ManagerID, Gender
Having (EmployeeNumber > 2)

Note that the above sentence is wrong, after the HAVING clause can not refer to the alias or variable name, if you need to implement the above effect, you need to add the count (*) of this expression in the HAVING clause again, correctly written as follows:

SELECT ManagerID, Gender, SUM (sickleavehours) as Sickleavehours, COUNT (*) as EmployeeNumber
From HumanResources.Employee
GROUP by ManagerID, Gender
Having (COUNT (*) > 2)

The results are as follows:

We see that only the number of employees is more than 2 people are selected.

Of course, the most powerful part of having a clause is that it can use an aggregate function as an expression, which is not allowed in the WHERE clause. The following example demonstrates the strength of the HAVING clause:
Or the data from the above example:
I want to get the total sick time of the employees under the different managers, and the employee with the most sick leave hours is twice times more than the minimum sick leave employee:

SELECT ManagerID, SUM (sickleavehours) as Totalsickleavehours, COUNT (*) as EmployeeNumber
From HumanResources.Employee
GROUP by ManagerID
Having (MAX (sickleavehours) > 2 * MIN (sickleavehours))


The results are as follows:


It can be seen here that the HAVING clause implements the powerful functionality that is so simple to implement, and it would be very troublesome to use where. In the above result, the scope of the HAVING statement aggregate function can be illustrated in the following illustration:


It can be seen from the above that the filtered data satisfies the maximum number of hours that the person who has been asked to leave is significantly greater than twice times the minimum number of hours for the worker.


This paper begins with the concept of aggregation function, and describes the concepts and usage of query, grouping, filtering, which are often used in the use of aggregate functions. Using aggregate functions, you can move a lot of the tasks that you put into the application's business layer into a database tutorial. This can be a great help for maintenance and performance improvements.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.