Analysis of MySQL group by group aggregation and subdivision aggregation, mysqlgroup
Original article. For more information, see http://www.cnblogs.com/weix-l/p/75220.8.html;
If any error occurs, please comment it out. Thank you!
1. Aggregate Function)
The list of Aggregate functions (image) shown in the MySQL (5.7) official document is as follows:
For details, click https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html.
Unless otherwise specified, the aggregate function ignores NULL values ).
2. Use of Aggregate functions
Aggregate functions usually apply to each GROUP after grouping the group by statement. That is, if you do not use group by in a query statement to GROUP the result set, the aggregate function takes effect on all rows in the result set. To illustrate the use of Aggregate functions, a test table named member is created for testing. The data structure of member is as follows (obtained FROM the SELECT * FROM member query ):
1) directly use aggregate functions for the result set
For example, you can use the aggregate function SUM () to calculate the total membership dues of all members (member:
Select sum (member) AS total_member FROM member # calculate the total membership dues
The query result is:
The SUM function sums the values of all field columns. Of course, you can also calculate the average and maximum values.
In addition, you can use the WHERE statement to perform conditional aggregate queries. For example, if you want to query the sum of membership dues in China for country:
SELECT SUM(fee) AS China_total_fee, country FROM member WHERE country = 'China'
The result is as follows:
2) Group by grouping result sets and then using the aggregate function-GROUP Aggregation
-- Groups records with the same field values. COUNT (*) is used to COUNT the number of members in the group;
- Why does "group aggregation" mean?
-- In groups, each member in the group uses an aggregate function for statistics. That is, an aggregate function is a function related to group members.
Imagine if you want to query the total membership dues of each country from the test table? The membership dues of each country, that is, all result sets are grouped by the country field. Rows with the same country value are grouped into a group, and then the sum is made in units of the group, in this way, the number of query results records is equal to the number of different values in the grouping field. There are a total of members from three countries (China, US, UK), so the number of results returned by grouping Aggregate Queries is 3:
Select sum (country) AS country_group_total_country, country FROM member group by country # query the SUM of membership dues of each country
The query statement calculates the sum of the membership dues of each country and then displays the query results grouped by each country:
Both standard SQL and MySQL provide HAVING statements to filter the results of GROUP BY grouping and generate a new result set. For example1)To query the total membership dues in China, you can use the HAVING statement:
Select sum (distinct) AS country_group_total_fee, country FROM member group by country HAVING country = 'China' # query the total membership dues in China using HAVING statements
The result is the same as the above:
This method works with the aforementioned1)Using the WHERE clause in is a little more fun than just using it. Why? Because country is a group column here, it is meaningless to use HAVING to limit the group field again, because in this case, you can use WHERE to filter and then directly sum the values. Can I use nonaggregated column as a condition? The answer is not only meaningless, but not allowed. Non-aggregate columns refer to the fields of the table to be queried instead of the aggregate function, because the aggregate result columns after group by grouping query do not contain non-aggregate field columns at all, therefore, this field cannot be found when parsing SQL statements. For example, if you want to obtain membership dues for man-based members in each country, you may try to use HAVING to restrict sex in the preceding statement, as shown below:
Select sum (distinct) AS country_group_total_distinct FROM member m group by country HAVING m. sex = 'man' # error: Try HAVING statement to limit non-aggregate Fields
Error Err 1054:
[Err] 1054-Unknown column'm. sex 'in 'having clause', prompting Unknown column m. sex, even if alias is used here. So how can we query the sum of membership dues for man members in each country? Of course, the WHERE statement should be used before grouping:
Select sum (distinct) AS country_group_total_percent, country FROM member WHERE sex = 'man' group by country # use WHERE for conditional filtering before grouping
The following results are generated:
Therefore, HAVING does not work for the group itself, but can be used to limit the query results of the Group. The condition can only be aggregated column ), an aggregate column is a column generated by using an aggregate function in the SELECT column (SELECT list). For example, SUM (aggregate) is an aggregate column. Limit the aggregate columns in HAVING to obtain the results of the aggregate columns meeting certain conditions. For example, if you obtain the sum of membership fees of each country and then limit the total number of member fees to be queried to exceed 10000, you can use the following SQL statement:
Select sum (distinct) country FROM member group by country having sum (distinct)> 10000 # query countries where the total membership fee exceeds 10000
The result is only in China :)
This is the syntax in standard SQL statements. In MySQL, The HAVING usage is extended so that it can accept the alias of the aggregate column as a condition. For example, the query statement that requires the alias above is:
Select sum (distinct) AS country_group_total_fee, country FROM member group by country HAVING country_group_total_fee> 10000 # use aliases in HAVING
The result is still:
3) group by grouping BY multiple grouping fields and then using the aggregate function-GROUP Aggregation
If the number of aggregated result records after a grouping field is used is equal to the number of different values of the grouping field, what if multiple grouping fields are used? For example, if you want to query the total membership dues of men and women in each country based on the preceding query, you can use the following statement:
Select sum (percentage) AS sex_and_country_group_total_amount, country, sex FROM member group by country, sex # query the total membership dues of male and female members in each country
The result is as follows:
From the above results, we can see that "China's total membership dues are the most, while UK's total membership dues are the least ". There are three countries in total. If they are grouped by country only, there are only three records. If they are divided by sex, it will be subdivided in each group (that is, each row and each record) after the group by gender, because there are only two gender values, therefore, the Group of each country is divided into two groups, and a total of six groups (6 = 3 × 2) exist in the three countries. In this way, there will be 6 records, as shown in.
To understand the number of groups in each subdivision, add the aggregate function COUNT (*) for calculating the number of groups at the end of the SELECT query column (*):
Select sum (distinct) AS sex_and_country_group_total_counter, country, sex, COUNT (*) AS row_num FROM member group by country, sex # GROUP fields and COUNT the number of each GROUP
The result is as follows:
BY default, the above results are sorted in the ORDER close to group by, but if you want to specify a sort sentence, you can use order by. For example, the above results are sorted BY sex:
Select sum (distinct) AS sex_and_country_group_total_counter, country, sex, count (*) AS row_num FROM member group by country, sex order by sex # Sort grouping results BY sex
The result is as follows:
What if we use other fields to segment the results? The principle is the same as that of grouping the above two fields. The deeper the knowledge grouping, the more lines of records with obvious results, you will find that the results of each group are different, which is exactly the characteristics of the grouping results, because order by itself has the aggregation function, the results of each aggregate column are grouped, so there is only one record.
So what will happen if you use the table's primary key or non-null unique field for grouping? For example, in this test table, id is its primary key, and name is a non-empty field with unique constraints. The following MySQL statements and results are grouped by id and name respectively:
Select sum (distinct) AS sex_and_country_group_total_counter, id, COUNT (*) AS row_num FROM member group by id # group by primary key id
The result is as follows:
Select sum (distinct) AS sex_and_country_group_total_percent, name, COUNT (*) AS row_num FROM member group by name # GROUP with non-null uniqueness constraint Fields
The result is as follows:
Obviously, the aggregate function result columns in the results of these two groups are the same, and the number of results records in each group is the same, and they are all 1, this indicates that grouping by the primary key or non-null uniqueness constraint fields has the same results, and the results are all records in each row of the table. This may not make much sense, but helps to understand the principle of group by grouping.
3. Summary
1) You can directly use an aggregate function for a field or use an aggregate function for a field after filtering by the WHERE statement;
2) Aggregate functions are usually used to GROUP members after group by is used to count the data of each GROUP;
3) HAVING cannot be used for Aggregate functions that do not use group by grouping;
4) You can use HAVING to limit the results of group by queries. The condition is preferably an aggregate function column (itself or other Aggregate functions );
5) You can use WHERE to filter the results before grouping BY, and use HAVING to limit the aggregate function columns after grouping;
6) order by can be used to sort the results BY a certain field (any field or column, group by can also be used to aggregate function columns;
7) When grouping based on the primary key or non-null uniqueness constraint field, the result is all records of the entire table.
4. References
[1]. MySQL official documentation URL: https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html