Detailed description of GROUPBY and COUNT usage

Source: Internet
Author: User
Before introducing the GROUPBY and HAVING clauses, Aggregate functions must first talk about a special function in SQL: Aggregate functions, such as SUM, COUNT, MAX, and AVG. The fundamental difference between these functions and other functions is that they generally work on multiple records. SELECTSUM (population) FROMbbc SUM serves all returned records

Before introducing the group by and HAVING clauses, Aggregate functions must first talk about a special function in SQL: Aggregate functions, such as SUM, COUNT, MAX, and AVG. The fundamental difference between these functions and other functions is that they generally work on multiple records. Select sum (population) FROM bbc the SUM here applies to all returned records

Aggregate functions

Before introducing the group by and HAVING clauses, we must first talk about a special function in SQL: aggregate function, such as SUM, COUNT, MAX, and AVG. The fundamental difference between these functions and other functions is that they generally work on multiple records.

Select sum (population) FROM bbc

SUM is used in the population field of all returned records. The result is that only one result is returned for the query, that is, the total population of the country.

Group by usage

The Group By statement is interpreted literally as "grouping by certain rules )". It is used to divide a dataset into several small areas through certain rules, and then process data for several small areas.
Note: group by is sorted first and then grouped;
For example, if you want to use group by, you generally use "each field". For example, there is a table like this: the number of people in each department needs to use grouping technology.

Select distinct mentid as 'department name', COUNT (*) as 'Count' from BasicDepartment group by distinct mentid

The group by + field is used for grouping. We can understand that we have grouped the dataset according to the department name ID; then, the statistics of each group are calculated;
To put it simply, the group by field 1 and Field 2... (More than these two fields in the table) indicates that field 1 in the dataset is equal, and field 2 is also equal. Only one piece of data is displayed. Then you can calculate Field 3 (sum, average, etc)

Note:
Select distinct mentid, DepartmentName from BasicDepartment group by distinct mentid

-An error will occur.

The 'destentname' column in the selection list is invalid because it is not included in the aggregate function or group by clause.This is what we need to pay attention to. If these fields in the returned set are included after the Group By statement, they will be used as the basis for grouping, or they will be included in aggregate functions.. Why? According to the preceding description, if only one record is displayed for a group of data equal to the specified mentid, if there are three data items in the dataset.

Inclumentid DepartmentName
Dept001 Technology Department
Dept001 General Department
Dept001 Ministry of Human Resources
So I can only display one record. Which one do I display? No way to judge. Three options are available here:

  1. Add DepartmentName to the condition of the GROUP (group by partition mentid, DepartmentName). Then these three records are three groups.
  2. The DepartmentName field is not displayed.
  3. Aggregate these three records into one count (DepartmentName) record)
WHERE and HAVING

HAVING clause allows us to filter the data of each group after grouping. HAVING clauses can use aggregate Functions
The WHERE clause filters records before aggregation. That is to say, the aggregate function cannot be used in the. WHERE clause before the group by clause and HAVING clause.
Example:
1. shows the total population and total area of each region.

SELECT region, SUM(population), SUM(area)FROM bbcGROUP BY region

First, return records are divided into multiple groups BY region, which is the literal meaning of group. After grouping, Aggregate functions are used to calculate different fields (one or more records) in each group.

2. The total population and total area of each region are displayed. Only those regions with an area exceeding 1000000 square meters are displayed.

SELECT region, SUM(population), SUM(area)FROM bbc8 F4 w2 v( P- fGROUP BY regionHAVING SUM(area)>1000000

Here, we cannot use where to filter more than 1000000 of the regions, because such a record does not exist in the table. On the contrary, the HAVING clause allows us to filter the data of each group after grouping.

Note:: When the where clause, group by clause, having clause, and aggregate function are included at the same time, the execution sequence is as follows:
Execute the where clause to search for qualified data;
Use the group by clause to group data. Run the clustering function on the group formed by the group by clause to calculate the values of each group. Finally, use the having clause to remove groups that do not meet the conditions.
Both the having clause and the where clause can be used to set restrictions so that the query results meet certain conditions.
The having Clause limits groups rather than rows. Clustering functions cannot be used in the where clause, but can be used in the having clause.

Complex group by and COUNT usage

Let's explain it using examples.
Existing table: Residential Area table: DW_DM_RE_RC. Some fields are as follows:

select AREA_ID,AREA_NAME,CITY_ID,CITY_NAME,RC_ID,RC_NAME,RC_TYPE_ID,RC_TYPE_NAME,RC_ADDRESS,FLOOR_CNT,BUILDING_CNT from DW_DM_RE_RC


Data is mainly concentrated in Suqian and Wuxi.
Now you need to group by AREA_ID and CITY_NAME, and display the number of data for the same AREA_ID. (AREA_ID and AREA_NAME are associated. CITY_ID and CITY_NAME are associated)
Step 1:
Sql1:

select COUNT(*) as COUNT,AREA_ID,AREA_NAME,CITY_ID,CITY_NAME from DW_DM_RE_RCgroup by AREA_ID,AREA_NAME,CITY_ID,CITY_NAME


Here, COUNT displays the conditions of AREA_ID and CITY_NAME for grouping,
It indicates AREA_ID = 510, CITY_NAME = 'binhu '(binhu district, Wuxi City) has 131 data records, indicating AREA_ID = 510, CITY_NAME = 'yixing' (Yixing district, Wuxi City) there are 131 pieces of data, indicating AREA_ID = 527, CITY_NAME = 'sihong '(Suqian Sihong district) there are 101 pieces of data, but what I need is AREA_ID = 510 (Wuxi City, regardless of the region. Inspired by this, you can use the query result of sql1 as the result set and perform another query on it.
Sql2:

SELECT AREA_ID,AREA_NAME,SUM(COUNT),CITY_ID,CITY_NAME  FROM (select COUNT(*) as COUNT,AREA_ID,AREA_NAME,CITY_ID,CITY_NAME from DW_DM_RE_RCgroup by AREA_ID,AREA_NAME,CITY_ID,CITY_NAME)TEST GROUP BY AREA_ID,AREA_NAME,CITY_ID,CITY_NAME

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.