SQL Server aggregate functions in the actual work to deal with a variety of requirements are still very extensive, for the optimization of aggregate functions has become a focus, a program optimization of the good or bad directly determines the program's declaration cycle. The SQL Server aggregate function performs a calculation on a set of values and returns a single value. Aggregate functions perform calculations on a set of values and return a single value. Aggregate functions ignore null values except for COUNT. Aggregate functions are often used with the GROUP by clause of a SELECT statement.
A. Written in the front
If there is a SQL Server aggregate function is not familiar or forgotten to see my previous blog.
All data demos In this article are in the Microsoft Official sample database: Northwind, as for Northwind, you can also download them online.
Two. SQL Server scalar aggregation
2.1. Concept: an aggregate function (such as MIN (), MAX (), COUNT (), SUM (), or AVG ()) that is specified in a SELECT statement column list that contains only aggregate functions. When the column list contains only aggregate functions, the result set has only one row for the aggregate value, which is computed from the source row that matches the WHERE clause predicate.
2.2. Explore Scalar Aggregation:
Let's take a look at a simple stream aggregation count () with SQL Server's "include Actual execution plan" to see all the rows in the table and table data.
You can then look at the details of the statement execution by using SET SHOWPLAN_ALL on (more information about the columns contained in the output is available in the link), and estimate the statement's requirements for the resource.
With SET SHOWPLAN_ALL on we look at count () to do those things specifically:
- Index scans: Scans the number of rows in the current table
- Flow calculation: Count the number of rows
- Compute scalar: Converts the result of a stream to the appropriate type. (Because the result of the index scan is based on the size of the data in the table, if there are many data in the table, count is a problem with the int type, so the default type (the general default type of the numeric value) is converted to the int type when it is finally returned. )
- Summary: On the SET SHOWPLAN_ALL on we can look at what the SQL Server aggregate function does for this effect when it gives us the final effect.
2.3. Scalar Aggregation Optimization Techniques:
Let's look at the difference between two simple SQL queries.
Copy Code code as follows:
SELECT COUNT (DISTINCT ShipCity) from Ordersselect COUNT (DISTINCT OrderID) from Orders
From the above figure can be seen, in fact, these two queries from the statement to say that there is not much difference, but why the cost will be different, one is query the city is a query order number. This is because in fact distinct for OrderID query, is not meaningful, because OrderID is the primary key, there will be no duplication. And ShipCity will be repeated, SQL Server to the heavy mechanism in the heavy, there will be a sort of process. This sort of comparison consumes resources.
For a table with a large amount of data, it is not recommended to sort large tables or to redo a field that has more repetitions than a large table. So we can optimize the shipcity here. You can create a nonclustered index on shipcity.
Copy Code code as follows:
CREATE INDEX index_shipcity on Orders (ShipCity desc) go
As you can see from the illustration above, the query that has been indexed count (DISTINCT ShipCity) becomes two stream aggregations without sorting and saves overhead.
Summary: For scalar aggregation from the above example, you can see that the advantages and disadvantages of scalar aggregation are obvious:
- SQL Server Scalar aggregation benefits: The algorithm is simpler and more intuitive and suitable for aggregation operations with distinct values. SQL Server scalar aggregation disadvantage: poor performance (requiring sorting), not suitable for aggregation operations for duplicate values.
- Optimization tip: Try to avoid sorting generation and lock group by segments within index coverage
Three. SQL Server Hash Aggregation
3.1. Concept:
Hash (hash, the general translation of "hash", there is a direct transliteration to "hash", is the arbitrary length of the input (also called the Pre-image), through the hashing algorithm, transformed into a fixed-length output, the output is a hash value. This conversion is a compression map, in which the space of the hash value is usually much smaller than the input space, and different inputs may be hashed out into the same output, so it is not possible to uniquely determine the input value from the hash value. Simply put, a function that compresses messages of any length into a message digest of a fixed length. )
The internal implementation of hash aggregation and the implementation of the hash join mechanism, the need for the internal operation of the hash function, the formation of different hash values, in turn, parallel scan data to form the aggregate value.
3.2. Background:
In order to solve the problem of flow aggregation, the operation of large data should be done, so hash aggregation is born.
3.3. Analysis:
Take a look at two simple queries.
The ShipCountry and CustomerID grouped queries look similar, but why are the execution plans different? This is because ShipCountry contains a large number of duplicate values, CustomerID Duplicate values are very small, so the SQL Server system pushes the hash to ShipCountry, and CustomerID pushes the stream aggregation. That is, the SQL Server system will dynamically select the appropriate aggregation method based on the query. So we do SQL optimization can not only be based on the SQL statement to optimize, but also the combination of specific data distribution environment.
Four. Operation Process Monitoring Index
4.1. Monitoring elements:
Visual View run time t-SQL statement query time occupy memory T-SQL statement query IO
4.2. Visual view of the running time:
4.3.t-sql Statement Query time:
4.4. Memory consumption:
4.5.t-sql statement Query IO:
There are a lot of monitoring elements, here are a few.
The
SQL Server aggregation function algorithm optimization techniques are introduced almost here, and I hope it helps to optimize the aggregation function algorithm.