Five tips to easily collect SQL Server Data

Source: Internet
Author: User

Five tips to easily collect SQL Server Data

Sometimes you want SQL Server to return an aggregation result set, rather than a detailed result set. The GROUPBY clause of SQL Server provides a way to aggregate SQL Server data. The group by clause allows you to group data in one or more columns or even expressions. In this article, I will discuss how to use the group by clause to summarize data.

Use separate column grouping

The group by clause sets grouping conditions to summarize data. In the first example, I GROUP data on a column in the table Sales. SalesOrderDetail. In the database AdventureWork2012. This example and other examples use the database AdventureWorks2012. If you want to use it to run my code, you can click to download it.

The following is the source code of the first example. Use the group by clause on the CarrierTrackingNumber column to group data.

 
 
  1. USE AdventureWorks2012; 
  2.  
  3. GO 
  4.  
  5. SELECT CarrierTrackingNumber 
  6.  
  7. ,SUM(LineTotal) AS SummarizedLineTotal 
  8.  
  9. FROM AdventureWorks2012.Sales.SalesOrderDetail 
  10.  
  11. GROUP BY CarrierTrackingNumber; 

After running this code, I will get 3807 records. Below are the first five values in this huge result set:

 
 
  1. CarrierTrackingNumber LineTotal 
  2.  
  3. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
  4.  
  5. 6E46-440A-B5 2133.170700 
  6.  
  7. B501-448E-96 4085.409800 
  8.  
  9. 8551-4CDF-A1 72616.524200 
  10.  
  11. B65C-4867-86 714.704300 
  12.  
  13. 99CE-4ADA-B1 16185.429200 

In the above example, I use the group by clause to select which columns will be used as the clustering dventureWorks2012.Sales. salesOrderDetai: Conditions for data in a data table. In this example, I use CarrierTrackingNumber to summarize data. When you group data, only the columns that appear in the group By clause are valid in the selection list. In my example, I used the aggregate function SUM to calculate LineTotal. For convenience, I set the alias SummarizedLineTotal for it.

If I want to obtain a clustering set whose CarrierTrackingNumber meets certain conditions, I can restrict the query in the Where clause, as I do below:

 
 
  1. USE AdventureWorks2012; 
  2.  
  3. GO 
  4.  
  5. SELECT CarrierTrackingNumber 
  6.  
  7. ,SUM(LineTotal) AS SummarizedLineTotal 
  8.  
  9. FROM AdventureWorks2012.Sales.SalesOrderDetail 
  10.  
  11. WHERE CarrierTrackingNumber = '48F2-4141-9A' 
  12.  
  13. GROUP BY CarrierTrackingNumber; 

Here, I added a limit in the where clause based on the original query. I set my query to only return results with CarrierTrackingNumber equal to a specific value. After running this code, I will get the number of rows in the record where CarrierTrackingNumber is 48F2-4141-9A. The filter action of the Where clause takes effect before data is aggregated.

Grouping by multiple columns

Sometimes you may need to use multiple columns for Data grouping. below is the sample code for grouping multiple columns.

 
 
  1. SELECT D.ProductID 
  2.  
  3. , H.OrderDate 
  4.  
  5. , SUM(LineTotal) AS SummarizedLineTotal 
  6.  
  7. FROM AdventureWorks2012.Sales.SalesOrderDetail D 
  8.  
  9. JOIN AdventureWorks2012.Sales.SalesOrderHeader H 
  10.  
  11. ON D.SalesOrderId = H.SalesOrderID 
  12.  
  13. GROUP BY ProductID, OrderDate; 

The query returns 26878 rows of data. This is part of the result returned by the preceding query:

 
 
  1. ProductID OrderDate LineTotal 
  2.  
  3. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
  4.  
  5. 714 2008-05-21 00:00:00.000 99.980000 
  6.  
  7. 859 2007-11-03 00:00:00.000 48.980000 
  8.  
  9. 923 2007-11-23 00:00:00.000 14.970000 
  10.  
  11. 712 2007-12-22 00:00:00.000 62.930000 
  12.  
  13. 795 2007-10-14 00:00:00.000 2443.350000 
  14.  
  15. 950 2007-07-01 00:00:00.000 2462.304000 
  16.  
  17. 795 2007-11-06 00:00:00.000 2443.350000 
  18.  
  19. 877 2007-11-19 00:00:00.000 15.900000 
  20.  
  21. 713 2007-10-01 00:00:00.000 99.980000 
  22.  
  23. 860 2008-05-31 00:00:00.000 48.980000 
  24.  
  25. 961 2008-05-01 00:00:00.000 36242.120880 

In the preceding example, the ProductID and OrderDate columns are used in the Group by clause. SQL Server returns the LineTotal value based on the uniqueness of the combination of ProductID and OrderDate, and sets its alias SummarizedLineTotal. If you view the program output, you will find that after SQL Server groups the data, the returned results do not have a specific order. If you need to sort the returned results in a certain order, you need to use the order by clause, as shown in the following code.

 
 
  1. SELECT D.ProductID 
  2.  
  3. , H.OrderDate 
  4.  
  5. , SUM(LineTotal) AS SummarizedLineTotal 
  6.  
  7. FROM AdventureWorks2012.Sales.SalesOrderDetail D 
  8.  
  9. JOIN AdventureWorks2012.Sales.SalesOrderHeader H 
  10.  
  11. ON D.SalesOrderId = H.SalesOrderID 
  12.  
  13. GROUP BY ProductID, OrderDate 
  14.  
  15. ORDER BY SummarizedLineTotal DESC; 

In the above code, I sorted the result set in descending order of SummorizedLineTotal. The values of this column are grouped BY the Group BY clause and obtained BY using the aggregate function Sum for LineTotal. I sorted the results in descending order according to the values of SummorizedLineTotal. If you run this program, you can obtain the highest number of LineTotal ProductID and OrderDate.

Group data without any value

Sometimes you need to group data that contains null values in some records. When you perform this operation on SQL Server, it automatically assumes that all NULL values are equal. Let me look at the example program in detail.

 
 
  1. CREATE TABLE NullGroupBy (OrderDate date, Amount Int); 
  2.  
  3. INSERT INTO NullGroupBy values (NULL,100), 
  4.  
  5. ('10-30-2014',100), 
  6.  
  7. ('10-31-2014',100), 
  8.  
  9. (NULL,100); 
  10.  
  11. SELECT OrderDate, SUM(Amount) as TotalAmount 
  12.  
  13. FROM NullGroupBy 
  14.  
  15. GROUP BY OrderDate; 
  16.  
  17. DROP TABLE NullGroupBy; 
  18.  
  19. When I run this code I get the following output: 
  20.  
  21. OrderDate TotalAmount 
  22.  
  23. -- -- -- -- -- -- -- -- -- -- - 
  24.  
  25. NULL 200 
  26.  
  27. 2014-10-30 100 
  28.  
  29. 2014-10-31 100 

In the above program, I first created and filled a NullGroupBy table. in this table, I placed four different rows. The value of orderDate in the first and last rows is NULL, and the value of orderDate in the other two columns is different. As shown in the preceding output, SQL Server aggregates the two rows whose OrderDate is NULL into one row during grouping.

Use expression in Group BY clause

Sometimes you need to use an expression in the Group by clause, instead of a specific column. SQL Server allows you to specify an expression in the Group By clause, as shown in the following code:

 
 
  1. SELECT CONVERT(CHAR(7),H.OrderDate,120) AS [YYYY-MM] 
  2.  
  3. , SUM(LineTotal) AS SummarizedLineTotal 
  4.  
  5. FROM AdventureWorks2012.Sales.SalesOrderDetail D 
  6.  
  7. JOIN AdventureWorks2012.Sales.SalesOrderHeader H 
  8.  
  9. ON D.SalesOrderId = H.SalesOrderID 
  10.  
  11. GROUP BY CONVERT(CHAR(7),H.OrderDate,120) 
  12.  
  13. ORDER BY SummarizedLineTotal DESC; 

The above Code uses the year and month data in the OrderDate column for grouping, and uses the expression CONVERT (CHAR (7), H. orderDate, 120), I told the SQL Server to intercept the first seven characters in the standard Date Format of OrderDate ODBC, that is, the OrderDate yyyy-mm section. Based on this expression, I can find the total SummarizeLineTotal value for a specific year, use the expression in the Group By clause, and sort the LineTotal value, I can find out which year or month the SummarizeLineTotal is the largest or the smallest.

Use HAVING clause to filter data

Having is another important clause that can be used with the Group BY clause. Using Having clause, you can filter out data rows that do not conform to the expressions of Having clause, when I use the where clause on it, the filtering behavior will be generated before clustering. The Having clause allows you to filter aggregate rows based on certain criteria. For more information about the Having clause, see the following code:

 
 
  1. SELECT D.ProductID 
  2.  
  3. , H.OrderDate 
  4.  
  5. , SUM(LineTotal) AS SummarizedLineTotal 
  6.  
  7. FROM AdventureWorks2012.Sales.SalesOrderDetail D 
  8.  
  9. JOIN AdventureWorks2012.Sales.SalesOrderHeader H 
  10.  
  11. ON D.SalesOrderId = H.SalesOrderID 
  12.  
  13. GROUP BY ProductID, OrderDate 
  14.  
  15. HAVING SUM(LineTotal) > 200000 
  16.  
  17. ORDER BY SummarizedLineTotal DESC; 

In the preceding code, the Having clause has the following constraints: SUM (LineTotal)> 200000. This having clause ensures that the aggregate value of LineTotal in the final result is SummarizedLineTotal) greater than 200000. by using the having clause, My query returns only one row of data, and its SummarizedLineTotal is greater than 200000. the having clause allows SQL Server to return only data rows whose aggregate results meet the Having clause restrictions.

Summary

Many applications require that data be summarized before being displayed. The group by clause is a mechanism provided by SQL Server. The group by clause allows you to use the HAVING clause to filter the summarized data. We hope that you can achieve the goal more easily next time you need to summarize some data.

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.