Usage of the groupby statement in mysql

Source: Internet
Author: User
This article describes in detail the usage of the groupby statement in mysql. For more information, see explain.

This article describes in detail the usage of the group by statement in mysql. For more information, see explain.

1. Introduction to Group By statements:
The Group By statement is interpreted literally as "grouping by certain rules )". It is used to divide a dataset into several small areas through certain rules, and then process data for several small areas.
P.S. Here I really realized the power of a good name. It is quite understandable that Group By can be understood literally. Well, you must work hard in the naming process later :). The topic is far away.
2. Use of Group:
The above section provides an understanding of the Group By statement. Based on this understanding and the help of SQL Server 2000, The following describes the typical use of Group By statements in sequence.
2.1 Group By [Expressions]:
This is probably the most common use of Group By statements. Group By + [Group fields] (multiple fields may exist ). After performing this operation, the dataset is divided into different groups based on the value of the grouping field. For example, there is a dataset with the fruit name (FruitName) and ProductPlace as the joint primary key:
FruitNameProductPlacePrice
AppleChina $1.1
AppleJapan $2.1
AppleUSA $2.5
OrangeChina $0.8
BananaChina $3.1
PeachUSA $3.0

If we want to know how many fruits each country has, we can use the following SQL statement:
Select count (*) AS fruit type, ProductPlace AS country of origin
FROM T_TEST_FRUITINFO
Group by ProductPlace
This SQL statement uses the Group By + grouping field method, so this SQL statement can be interpreted as "I Group data sets By ProductPlace, then, the number of records of each group is counted." That's easy to understand, right. It is worth noting that there are two return fields in the result set: ProductPlace and fruit type. If we do not use Count (*) as the fruit type here, we can write it like the following:
SELECT FruitName, ProductPlace FROM T_TEST_FRUITINFO group by ProductPlace
When executing this statement, SQL reports the following similar errors:
The 'T _ test_fruitinfo.fruitname' column in the selection list is invalid because it is not included in the aggregate function or group by clause.
This is what we need to pay attention to. If these fields in the returned set are included after the Group By statement, they are used as the basis for grouping; or it will be included in the aggregate function. We can think of the Group By operation as a process as follows. First, the system obtains a result set based on the SELECT statement, such as a detailed table of the first fruit, country of production, and unit price. Then, records with the same group fields are merged into one record based on the group field. At this time, the remaining fields that do not exist behind the Group By statement as the basis for grouping may have multiple values, but currently, only one record exists in a Group, a Data lattice cannot be placed into multiple values. Therefore, we need to convert these multi-value columns into single values through some processing, and then put them in the corresponding data lattice, the aggregate function is used to complete this step. That's why these functions are called aggregate functions.
2.2 Group By All [expressions]:
The Group By All + Group field, which has an additional keyword "ALL" in the form of Group By [Expressions] mentioned above. This keyword is effective only when the where statement is used and the where condition filters out some groups. In the SQL Server 2000 online help, Group By All is described as follows:
If the ALL keyword is used, the query results include ALL groups generated BY the group by clause, even if some groups do not have rows that match the search criteria. Without the ALL keyword, SELECT statements containing the group by clause do not display groups without matching rows.
There is a saying: "If the ALL keyword is used, the query result will contain ALL the groups produced By the Group By clause... If the keyword "ALL" is not displayed, the row group that does not meet the condition is not displayed ." This sentence sounds familiar, RIGHT, and left join and right join. In fact, this is an analogy of left join. It is based on the following Dataset:
FruitNameProductPlacePrice
AppleChina $1.1
AppleJapan $2.1
AppleUSA $2.5
OrangeChina $0.8
BananaChina $3.1
PeachUSA $3.0

First, we do not use the Group By statement with the ALL Keyword:
Select count (*) AS fruit type, ProductPlace AS country of origin
FROM T_TEST_FRUITINFO
WHERE (ProductPlace <> 'Japan ')
Group by ProductPlace
In the final result, because the where an clause does not conform to the where clause, the Japan clause will not appear in the grouping result.
Now we add the ALL Keyword:
Select count (*) AS fruit type, ProductPlace AS country of origin
FROM T_TEST_FRUITINFO
WHERE (ProductPlace <> 'Japan ')
Group by all ProductPlace
After re-running, we can see the group of Japan, but the corresponding "fruit type" does not carry out real statistics, the aggregate function uses the default value 0 or NULL to replace the return value of the aggregate function based on the type of the returned value.
2.3 group by [Expressions] with cube | ROLLUP:
It must be noted that the Group By All statement cannot be used together with the CUBE and ROLLUP keywords.
First, let's talk about the CUBE keyword, which is described in SQL Server 2000 online help:
Specify that the result set contains not only normal rows provided by group by, but also summary rows. Return the group by summary rows of each possible GROUP and sub-GROUP combination in the result set. The group by summary row is displayed as NULL in the result, but can be used to indicate all values. Use the GROUPING function to determine whether the null value in the result set is a group by summary value.
The total number of rows in the result set depends on the number of columns contained in the group by clause. Each operand (column) in the group by clause is bound to NULL in the GROUP, and the GROUP applies to all other operands (columns ). Because CUBE returns a combination of each possible group and sub-group, the number of rows is the same no matter what sequence is used when the group column is specified.
The Group By statement is usually used to Group all the fields following it. If the CUBE keyword is added, the system groups all fields, the grouping conditions formed by all possible combinations of these grouping fields are also calculated by grouping. Because the above example is too simple, it is suitable here. Now our dataset will be used in another scenario. A table contains the basic information of the person: department ID of the employee (C_EMPLINFO_DEPTID), employee gender (C_EMPLINFO_SEX), employee name (C_EMPLINFO_NAME), etc. Now I want to know the number of people in each department of each gender. We can use the following statement:
SELECT C_EMPLINFO_DEPTID, C_EMPLINFO_SEX, COUNT (*) AS C_EMPLINFO_TOTALSTAFFNUM
FROM T_PERSONNEL_EMPLINFO
Group by C_EMPLINFO_DEPTID, C_EMPLINFO_SEX
However, if I want to know:
1. number of employees in all departments (this is equivalent to not grouping, because there is no restriction on the department and gender of employees, but this is indeed a combination of grouping conditions );
2. There are multiple people in each gender (in fact, the group is based only on the Gender (C_EMPLINFO_SEX );
3. How many people are there in each department (Here we only group by department (C_EMPLINFO_DEPTID); then we can use the ROLLUP statement.
SELECT C_EMPLINFO_DEPTID, C_EMPLINFO_SEX, COUNT (*) AS C_EMPLINFO_TOTALSTAFFNUM
FROM T_PERSONNEL_EMPLINFO
Group by C_EMPLINFO_DEPTID, C_EMPLINFO_SEX WITH CUBE
Here you can see that there are many more rows in the result set, and one or more fields in the result set, or even all the fields are NULL, please take a closer look and you will find that these records actually show all the statistics listed above. Those who have used SQL Server 2005 or RDLC must have an impression on the subtotal and grouping functions of the matrix. Can they all get the answer through this. I think the calculation of grouping and subtotal in RDLC is achieved through the CUBE and ROLLUP Keywords of Group. (Personal opinion, unconfirmed)
The CUBE keyword also has a very similar brother ROLLUP. Similarly, starting with this English word, roll up means "roll up". If the combination of cubes is absolutely free, therefore, the combination of ROLLUP requires some constraints. Let's take a look at the definition of the ROLLUP keyword in SQL Server 2000 online:
Specify that the result set contains not only normal rows provided by group by, but also summary rows. In hierarchical order, from the lowest level to the highest level of the group. The hierarchy of a group depends on the sequence used when the group column is specified. Changing the order of grouping columns affects the number of rows generated in the result set.
So what is the order? By the way, the order of the fields after Group By is higher than that of the Group field near Group By, and then decrease sequentially. For example, Group By Column1, Column2, and Column3. The order of grouping levels from high to low is: Column1> Column2> Column3. Let's look at the previous example. In SQL statements, we only replace the CUBE keyword with the ROLLUP keyword, for example:
SELECT C_EMPLINFO_DEPTID, C_EMPLINFO_SEX, COUNT (*) AS C_EMPLINFO_TOTALSTAFFNUM
FROM T_PERSONNEL_EMPLINFO
Group by C_EMPLINFO_DEPTID, C_EMPLINFO_SEX WITH ROLLUP
Compared with CUBE, the number of returned data rows is reduced a lot. :), Take a closer look. In addition to the normal Group By statement, the data also contains:
1. Number of employees in the Department. (this time, the group limit of the employee gender is removed first)
2. The number of employees in all departments. (the number of employees is listed in turn. This time, the group limit of the department where the employees are located is removed ).
In practical applications, some statistical functions of reports are very helpful.
Here is another question that needs to be explained. If we use the ROLLUP or CUBE keyword, some subtotal rows will be generated, fields excluded from grouping factors in these rows will be set to NULL, so there is still a situation, such as the existence of empty rows in the list based on grouping, NULL is also expressed as a group, so here we can not simply judge whether it is a subtotal record by NULL. The following example shows the result. As we mentioned above, we now add a "Discount column" (Discount) next to each product to show the Discount for the corresponding product. This value is blank, that is, NULL can be used to indicate that there is no corresponding discount information. The dataset is as follows:
FruitNameProductPlacePriceDiscount
AppleChina $1.10.8
AppleJapan $2.10.9
AppleUSA $2.51.0
OrangeChina $0.8 NULL
BananaChina $3.1 NULL
PeachUSA $3.0 NULL

Now, we need to count "how many kinds of products should there be for each discount, and the total number of products in total .", We can use the following SQL statement to complete the process:
Select count (*) AS ProductCount, Discount
FROM T_TEST_FRUITINFO
Group by Discount WITH ROLLUP
Okay. Run the command and you will find that the data is normal. According to the preceding dataset, the result is as follows:
ProductCountDiscount
3 NULL
10.8
10.9
11.0
6 NULL

Well, the quantity of products with various discounts has all come out, but there is a problem in the judgment when "no discount products" and "commodity subtotal" are displayed, because there are two records whose Discount is Null. Which one is it? Through the analysis data, we know that the first data (3, Null) should correspond to the number of products without discounts, and (6, Null) should correspond to the number of all goods. To judge the two Null values with different meanings, we need to introduce an aggregate function Grouping. Now let's modify the statement and add a column of return values using the Grouping function in the return value. The SQL statement is as follows:
Select count (*) AS ProductCount, Discount, GROUPING (Discount) AS Expr1
FROM T_TEST_FRUITINFO
Group by Discount WITH ROLLUP
At this time, let's look at the running results:
ProductCountDiscountExpr1
3NULL0
10.80
10.90
11.00
6NULL1

For a record that subtotal is performed based on the fields contained in the specified field Grouping, it is marked as 1, we can use this flag value to identify the rows produced by the ROLLUP or CUBE keyword from the subtotal record. Grouping (column_name) can include a parameter. Grouping will determine whether the NULL value of the corresponding field is a special NULL value generated by ROLLUP or CUBE, if yes, set the value to 1 in the new column generated by the Grouping aggregate function. Note that Grouping only checks the NULL value corresponding to Column_name to determine whether to set the value to 1, rather than whether the column is automatically added by ROLLUP or CUBE keywords.
2.2 execution sequence of Group By, Having, Where, Order by statements:
The execution sequence of the Group By, Having, Where, and Order by statements must be described. An SQL statement usually produces multiple temporary views, so the execution sequence of these keywords is very important, because you must understand whether this keyword is used to operate the fields before the corresponding view is formed or the temporary view is formed. This issue is especially important when the view with an alias is used. The keywords listed above are executed in the following Order: Where, Group By, Having, Order. First, the where clause deletes records that do not meet the conditions in the most original records (so we should try to filter out records that do not meet the conditions in the where statement to reduce the number of groups ), then, the filtered view is grouped By the grouping conditions specified after the Group By keyword, and the system then selects the filtering conditions specified after the Having keyword, filter out records that do not meet the conditions in the group view, and sort the view according to the Order By statement, so that the final result is generated. Among the four keywords, the column name of the final view can be used only in the Order By statement, for example:
SELECT FruitName, ProductPlace, Price, id as ide, Discount
FROM T_TEST_FRUITINFO
WHERE (ProductPlace = n'china ')
ORDER BY IDE
IDE is only available in the order by statement. In other condition statements, if you want to reference a column name, you can only use ID instead of IDE.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.