SQL Server detailed explanation and usage of GROUP by

Source: Internet
Author: User

1. Introduction to the Group by statement:

The GROUP BY statement is understood in the literal sense of English as "grouping (group) according to certain rules". Its function is to divide a data set into several small areas by certain rules and then data processing for several small regions.

P.s. It's really a good name for the power of the Group by literally being directly understood is very well understood. Well, in the future in the name of the link must add strength:). The topic is far away.

2. Use of Group by:

The understanding of the group by statement is given above. Based on this understanding and online Help for SQL Server 2000, the following is a list of typical uses of the group by statement.

2.1 Group by [Expressions]:

This is probably the most common use of the group BY statement, group by + [Group Field] (can have multiple). After performing this operation, the dataset divides a dataset into different groups based on the values of the grouped fields. For example, there is the following data set, where the fruit name (Fruitname) and the producing country (Productplace) are the Federated primary key:

Fruitname

Productplace

Price

Apple

China

$1.1

Apple

Japan

$2.1

Apple

USA

$2.5

Orange

China

$0.8

Banana

China

$3.1

Peach

USA

$3.0

If we want to know how many kinds of fruit each country has, then we can do it by using the following SQL statement:

SelectCount (*) As fruit species, productplaceas producing country
Fromt_test_fruitinfo
Groupbyproductplace

This SQL statement uses the group by + grouping field, then this SQL statement can be interpreted as "I group datasets according to the country of production (Productplace), then count the respective records according to each group." "Well understood, right. It is noteworthy here that there are two return fields in the result set, one is Productplace (the producer country) and the other is the fruit type. If our fruit species here is not count (*), but is similar to the following wording:

Selectfruitname,productplacefromt_test_fruitinfogroupbyproductplace

Then SQL will report a similar error when executing this statement:

Select the column ' T_test_fruitinfo in the list. Fruitname ' is not valid because the column is not contained in an aggregate function or GROUPBY clause.

This is one of the things we need to be aware of if, in the return set field, these fields are either to be included behind the group by statement, or to be included in the aggregate function. We can think of the group by operation as one of the following, first the system obtains a result set based on the SELECT statement, such as a detailed table of the first fruit, the country of production, and the unit price. The records with the same grouping fields are then merged into one record, based on the grouping fields. At this time, the remainder of the field that does not exist in the group by statement is likely to have multiple values, but there is only one record for a grouping, and a data grid cannot fit into multiple values, so there is a need to convert these multivalued columns into single values through some processing. Then put it in the corresponding data grid, then the completion of this step is the aggregation function. This is why these functions are called aggregate functions (aggregate functions).

2.2 Group by all [expressions]:

Group BY all + Group field, this and the previously mentioned group by [Expressions] form a keyword all. This keyword is only visible if the where statement is used, and if the Where condition filters out some groups. In the online Help for SQL Server 2000, this is described for group by all:

If you use the ALL keyword, the query results will include all groups produced by the GROUPBY clause, even if some groups do not have rows that match the search criteria. Without the all keyword, a SELECT statement that contains a GROUPBY clause will not display a group that does not have a qualifying row.

One of the words "if you use the ALL keyword, the query results will contain all the groups produced by the GROUP BY clause ... Without the all keyword, the row groups that do not meet the criteria are not displayed. "It sounds familiar, right, as if it's a bit like a left join and a join. This is actually an analogy to the left join to understand. Or based on a data set such as the following:

Fruitname

Productplace

Price

Apple

China

$1.1

Apple

Japan

$2.1

Apple

USA

$2.5

Orange

China

$0.8

Banana

China

$3.1

Peach

USA

$3.0

First, we do not use the GROUP BY statement with the ALL keyword:

SelectCount (*) As fruit species, productplaceas producing country
Fromt_test_fruitinfo
WHERE (productplace<> ' Japan ')
Groupbyproductplace

In the final result, Japan will not appear in the grouping result because Japan does not conform to the where statement.

Now we add the ALL keyword:

SelectCount (*) As fruit species, productplaceas producing country
Fromt_test_fruitinfo
WHERE (productplace<> ' Japan ')
Groupbyallproductplace

After re-running, we can see the group of Japan, but the corresponding "fruit species" does not really count, the aggregate function will be based on the return value of the type with the default value of 0 or null instead of the return value of the aggregate function.

2.3 GROUP by [Expressions] with CUBE | ROLLUP:

The first thing to note is that the GROUP by all statement cannot be used with the cube and ROLLUP keywords.

First say the cube keyword, and the following is a description of the SQL Server 2000 online Help:

Specifies that the result set contains not only normal rows provided by GroupBy, but also summary rows. Returns the GroupBy totals row for each possible group and subgroup combination within the result set. GroupBy summary rows appear as NULL in the results, but can be used to represent all values. Use the grouping function to determine whether null values in the result set are groupby.

The number of total rows in the result set depends on the number of columns contained within the GROUPBY clause. Each operand (column) in the GROUPBY clause is bound under a grouping of NULL, and the grouping applies to all other operands (columns). Because cube returns each possible group and subgroup combination, the number of rows is the same regardless of the order in which the grouped columns are specified.

Our usual GROUP by statements are grouped according to all the fields followed, and if the cube keyword is added, the system will be grouped based on all the fields and grouped by all possible combinations of all those grouped fields. Since the example above is too simple, this is the case, and now our dataset will be in a different scenario, with a table containing basic information about the person: the department number (C_emplinfo_deptid) of the employee, the employee's gender (c_emplinfo_sex), the employee's name (c_ Emplinfo_name) and so on. So now that I want to know the number of genders in each department, we can get the following statement:

Selectc_emplinfo_deptid,c_emplinfo_sex,count (*) asc_emplinfo_totalstaffnum
Fromt_personnel_emplinfo
Groupbyc_emplinfo_deptid,c_emplinfo_sex

But if I want to know now:

1. How many people are in all sectors (this is the equivalent of not grouping, as there are no restrictions on the department and Gender of the staff, but this is indeed a combination of grouping conditions);

2. The number of persons per gender (which is in fact only grouped by gender (c_emplinfo_sex));

3. How many people are in each department (this is only grouped by department (C_emplinfo_deptid)); Then we can use the rollup statement.

Selectc_emplinfo_deptid,c_emplinfo_sex,count (*) asc_emplinfo_totalstaffnum
Fromt_personnel_emplinfo
Groupbyc_emplinfo_deptid,c_emplinfo_sexwithcube

So here you can see a lot more rows in the result set, and one or more fields or even all of the fields in the result set are null, so take a closer look and you'll see that these are actually all the statistics that I've enumerated above. Friends who have used SQL Server 2005 or RDLC must be impressed with the subtotal and grouping functions of matrices, is it possible to get an answer through this. I think the calculations for grouping and subtotals in RDLC are implemented by the Group by cube and the Rollup keyword. (Personal opinion, not confirmed)

Cube keyword also has a very similar sibling rollup, also we start from this English, roll up is the meaning of "curl up", if the combination of cube is absolutely free, then the combination of rollup need a little restraint. Let's take a look at the definition of the ROLLUP keyword in the online SQL Server 2000:

Specifies that the result set contains not only normal rows provided by GroupBy, but also summary rows. Summary groups from the lowest level within the group to the highest level, in hierarchical order. The hierarchical structure of a group depends on the order in which the grouped columns are specified. Changing the order of grouped columns affects the number of rows generated within the result set.

So what's the order? Yes, the order of the fields after group by, the level of the Group field near group by is high, and then it is decremented in turn. such as: Group by Column1, Column2, Column3. Then the order of grouping levels from high to low is: Column1 > Column2 > Column3. In our previous example, we just replaced the cube keyword with the rollup keyword, as in the SQL statement:

Selectc_emplinfo_deptid,c_emplinfo_sex,count (*) asc_emplinfo_totalstaffnum
Fromt_personnel_emplinfo
Groupbyc_emplinfo_deptid,c_emplinfo_sexwithrollup

The number of data rows returned is reduced by a lot compared to cube. :), take a closer look, in addition to the normal group by statement, the data contains:

1. Number of departmental staff; (roll up once, this time first remove the group limit of the employee's gender)

2. The number of employees in all departments, (up and down, in turn, this time removed the group restrictions of the employee's department).

In real-world applications, some statistical functions of the report are helpful.

Here's one more question to add, if we use the rollup or cube keyword, then there will be a small number of rows where the fields that are excluded from the grouping factor will be set to NULL, and then there is a case, such as a nullable row in the list that is grouped by. Then NULL will also be represented as a grouping, so here we can not just pass NULL to determine whether it is a subtotal record. The following example shows what is being said here. Or the fruit example we mentioned earlier, we now add a "discount column" (Discount) to each product to display a discount for the corresponding item, which is nullable, which means that no corresponding discount information can be represented by NULL. The data set looks like this:

Fruitname

Productplace

Price

Discount

Apple

China

$1.1

0.8

Apple

Japan

$2.1

0.9

Apple

USA

$2.5

1.0

Orange

China

$0.8

Null

Banana

China

$3.1

Null

Peach

USA

$3.0

Null

Now we want to count "how many kinds of goods the various discounts correspond to and the total number of items." ", then we can do this with the following SQL statement:

SelectCount (*) Asproductcount,discount
Fromt_test_fruitinfo
Groupbydiscountwithrollup

Well, run it, and you'll find that the data is all right, as shown in the data set above, and the results are as follows:

ProductCount

Discount

3

Null

1

0.8

1

0.9

1

1.0

6

Null

Well, the number of discount items comes out, but there is a problem with the "No Discount item" and "Subtotal", because there are two records with discount null. Which one is it? By analyzing the data we know that the first piece of data (3, Null) should correspond to the number of items that are not discounted, and (6,null) should correspond to the quantity of all goods. It is necessary to judge the two different meanings of NULL to introduce an aggregate function grouping. Now let's modify the statement by using the grouping function in the return value to add a column return value, the SQL statement is as follows:

SelectCount (*) asproductcount,discount,grouping (Discount) ASExpr1
Fromt_test_fruitinfo
Groupbydiscountwithrollup

This time, let's look at the results of the run:

productcount

P align= "Center" >discount

expr1

3

td>

null

0

1

0.8

0

1

0.9

0

1

1.0

0

6

null

1

For a record of subtotals based on the fields contained in the specified field grouping, this is marked as 1, and we can use this tag value to count the subtotal records from the rows that were generated as a result of the rollup or cube keyword. Grouping (column_name) can take a parameter, Grouping will determine whether the corresponding field value NULL is a special null value produced by rollup or cube. If so, set the value to 1 in the new column produced by the grouping aggregate function. Note that grouping only checks for column_name for NULL to determine whether the value is set to 1, rather than whether this column is automatically added by the rollup or cube keyword.

2.2Group by and having, where, the order by statements are executed in sequence:

Finally, a description of the group by, have, Where, order by several statements of execution order. An SQL statement tends to produce multiple temporary views, so the order in which these keywords are executed is important because you have to know whether the keyword is to manipulate the field before the view is formed or to manipulate the resulting temporary view, which is especially important in a view that uses aliases. The keywords listed above are executed in the following order: Where, Group by, have, order by. First, where to delete records in the most original record that do not meet the criteria (so you should filter out the non-qualifying records as much as possible in the where statement, which reduces the number of groupings), and then group the filtered views by the grouping criteria specified after the GROUP by keyword. The system then filters out records that do not meet the criteria after grouping views according to the filter criteria specified after the HAVING keyword, and then sorts the view by the order BY statement so that the final result is generated. In these four keywords, the column name of the final view can be used only in the order by statement, such as:

Selectfruitname,productplace,price,idaside,discount
Fromt_test_fruitinfo
WHERE (productplace=n ' China ')
Orderbyide

You can use the IDE only in an order BY statement, and you cannot use the IDE if you need to reference column names in other conditional statements.

SQL Server detailed explanation and usage of GROUP by

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.