Deep Learning of SQL Server aggregate function algorithm optimization skills,
SQL server Aggregate functions are widely used in practical work to cope with various needs. Optimization of Aggregate functions naturally becomes a key point, whether a program is optimized or not directly determines the statement cycle of the program. SQL server Aggregate functions calculate a group of values and return a single value. Aggregate functions calculate a group of values and return a single value. Except COUNT, All Aggregate functions ignore null values. Aggregate functions are often used with the group by clause of SELECT statements.
I. Preface
If you are not familiar with SQL server Aggregate functions or forget them, you can read my previous blog.
All data demos in this article use the official Microsoft Sample Database: Northwind. You can also download the sample database from the Internet.
Ii. SQL server scalar Aggregation
2. 1. concept:An aggregate function (such as MIN (), MAX (), COUNT (), SUM (), or AVG () specified in the SELECT statement column table that only contains Aggregate functions ()). When the column list contains only Aggregate functions, the result set has only one row to give an aggregate value. The value is calculated from the source row that matches the WHERE clause predicate.
2. Exploration of scalar aggregation:
We first use SQL server's "including the actual execution plan" to look at a simple stream aggregation COUNT () to look at all the rows of data in the table.
Then, use SET SHOWPLAN_ALL ON (for more information about the columns in the output, you can view them in the Link) to view the statement execution details and estimate the resource requirements of the statement.
Through SET SHOWPLAN_ALL ON, let's take a look at the specific things that COUNT () has done:
- Index scan: scans the number of rows in the current table.
- Streamcompute: Number of calculated rows
- Calculate scalar: Convert the result of streamcompute to an appropriate type. (Because the index scan results are determined based on the data size in the table. If the table contains a large amount of data, the COUNT type may be int, therefore, you need to convert the default type (the default value type is Big) to the int type in the final return .)
- Summary: With SET SHOWPLAN_ALL ON, we can view what SQL server Aggregate functions did for the final effect.
2. scalar aggregation optimization skills:
Let's look at their differences through two simple SQL queries
Copy codeThe Code is as follows: select count (DISTINCT ShipCity) FROM OrdersSELECT COUNT (DISTINCT OrderID) FROM Orders
We can see that the two queries have no major difference in terms of statements, but the overhead is different. One is to query the city and the other is to query the order number. This is because DISTINCT is meaningless for OrderID queries, because OrderID is a primary key and there will be no duplicates. ShipCity has duplicates. SQL server's deduplication mechanism has a sorting process. This sorting still consumes resources.
For tables with a large data volume, we do not recommend that you sort large tables or deduplicate fields with a large number of duplicates. So here we can optimize ShipCity. You can create a non-clustered index for ShipCity.
Copy codeThe Code is as follows: create index Index_ShipCity On Orders (ShipCity desc) go
We can see that after the index is added, the COUNT (DISTINCT ShipCity) query becomes two stream aggregation, without sorting, saving the cost.
Conclusion: we can see from the above examples that the advantages and disadvantages of scalar aggregation are obvious:
- SQL server scalar aggregation advantages: the algorithm is simple and intuitive, suitable for non-repeated value aggregation operations. Disadvantages of SQL server scalar aggregation: poor performance (requires sorting), not suitable for repeated value aggregation operations.
- Optimization tips: Avoid sorting as much as possible and lock the group by segment within the index coverage.
Iii. SQL server hash Aggregation
3. 1. concept:
Hash (generally translated as "Hash", or directly translated as "Hash", refers to the input of any length (also called pre- ing, pre-image ), the hash algorithm is used to convert an output with a fixed length. The output is the hash value. This type of conversion is a compression ing, that is, the space of hash values is usually much smaller than the input space, and different inputs may be hashed into the same output, therefore, it is impossible to uniquely determine the input value from the hash value. Simply put, it is a function that compresses messages of any length to a fixed-length message digest .)
The internal implementation method of hash aggregation is the same as the implementation mechanism of hash connections. Different hash values are generated through the internal operation of the hash function, and the data is scanned in parallel to form the aggregation value.
3. 2. Background:
To solve the problem of stream aggregation and deal with Big Data Operations, hash aggregation is born.
3. Analysis:
Let's take a look at two simple queries.
The grouping query of ShipCountry and CustomerID looks similar, but why is the execution plan different? This is because ShipCountry contains a large number of duplicate values and there are very few duplicate values for CustomerID. Therefore, the SQL server system pushes the hash aggregation to ShipCountry, while CustomerID pushes the stream aggregation. That is to say, the SQL server system dynamically selects an appropriate aggregation method based on the query conditions. Therefore, we cannot optimize SQL statements only when optimizing SQL statements, but also the environment of specific data distribution.
Iv. Operational process monitoring metrics
4. 1. monitoring elements:
Visualized view of running time T-SQL statement query time in memory T-SQL statement query IO
. Visually view the running time:
4.3.T-SQL statement query time:
4. memory usage:
4.5.T-SQL statement query IO:
There are many other monitoring elements. Here are a few examples.
SQL Server aggregate function algorithm optimization tips are almost introduced here, I hope to help you optimize the aggregate function algorithm.
Articles you may be interested in:
- Enable database cache dependency to optimize website performance
- SQL Server optimizes SQL statements in and not in
- SQL Server execution plan and SQL query optimization
- SQL Server Optimization of paging storage process [let the database execute the query plan as we mean]
- SQL Server Index optimization tool
- SQL Server database optimization analysis (Graphic Analysis)
- SQL Server performance optimization skills
- Collect and organize SQL Server database optimization experience and precautions