SQL Server aggregate function in the actual work to meet the needs of a wide range of requirements, and for the optimization of the aggregation function has become a focus, a program optimization of the good or bad directly determines the declaration period of the program. The SQL Server aggregate function performs a calculation on a set of values and returns a single value. An aggregate function performs a calculation on a set of values and returns a single value. In addition to COUNT, the aggregate function ignores null values. Aggregate functions are often used with the GROUP by clause of a SELECT statement.
v1. Write in front
If there is a SQL Server aggregate function is not ripe or forgotten can look at my previous blog. Basic SQL Server Tutorial.
All data demos In this article are in the Microsoft Official sample database: Northwind, as for Northwind, you can also download it online. As for the download method MSDN already has the detailed explanation, here is not much to say.
v2.SQL Server Scalar Aggregation
2.1. Concept: An aggregate function (such as MIN (), MAX (), COUNT (), SUM (), or AVG ()) specified in the list of SELECT statements that contain only aggregate functions. When a column list contains only aggregate functions, the result set has only one row for the aggregate value, which is computed by the source row that matches the WHERE clause predicate.
2.2. Explore scalar aggregations:
Let's look at a simple flow aggregate count () with SQL Server's "include Actual execution plan" to see all the rows in the table data.
Then set SHOWPLAN_ALL on (more information about the columns contained in the output can be viewed in the link) to see more information about the execution of the statement and estimate the statement's requirements for resources.
By SET SHOWPLAN_ALL on we look at how count () specifically did those things:
- Index Scan: Scan the number of rows in the current table
- Flow calculation: Count the number of rows
- Compute scalar: Converts the result of the flow calculation into the appropriate type. (because the index scan results based on the size of the data in the table, if there is a lot of data in the table, count is the int type is problematic, so you need to return the default type (the value is usually the default type is big) to the int type. )
- Summary: With SET SHOWPLAN_ALL on we can see what the SQL Server aggregation function does for this effect when it gives us the final effect.
2.3. Scalar aggregation optimization techniques:
Let's take a look at the difference between two simple SQL queries
SELECT COUNT (DISTINCTfrom the OrdersSELECTCOUNT(DISTINCT from Orders
As can be seen in fact, these two queries from the statement said there is no big difference, but why the cost will be different, one is to query the city is a query order number. This is because in fact distinct for OrderID query, there is no meaning, because OrderID is the primary key, there is no duplication. And ShipCity will be duplicated, SQL Server's deduplication mechanism at the time of the heavy, there will be a sort of process. This sort is still more resource-intensive.
For a table with a large amount of data, it is not recommended to sort large tables or to redo a field that has a large number of repetitions of the larger table. So we can optimize the shipcity here. You can create a nonclustered index on shipcity.
As you can see, the query with count (DISTINCT ShipCity) after the index becomes two stream aggregations without sorting, saving overhead.
Summary: For scalar aggregation from the above example you can see that the advantages and disadvantages of scalar aggregation are obvious:
- SQL Server Scalar aggregation benefits: The algorithm is simple and intuitive, suitable for non-repeating value aggregation operations.
- SQL Server scalar aggregation disadvantages: poor performance (requires sorting), and aggregation operations that are not suitable for duplicate values.
Optimization tips:
- Try to avoid sorting to produce
- Lock a group by segment within the index coverage
v3.SQL Server Hash Aggregation
3.1. Concept:
Hash (hash, general translation do "hash", also have direct transliteration to "hash", is the arbitrary length of the input (also known as pre-mapping, pre-image), through the hash algorithm, transformed into a fixed-length output, the output is the hash value. This conversion is a compression mapping, that is, the space of the hash value is usually much smaller than the input space, different inputs may be hashed to the same output, so it is not possible to determine the input value from the hash value. Simply, a function that compresses messages of any length to a message digest of a fixed length. )
The internal implementation method of hash aggregation and the implementation mechanism of hash joins require the internal operation of the hash function to form different hash values and sequentially scan the data to form the aggregated values.
3.2. Background:
In order to solve the problem of stream aggregation, the operation of big data should be done, so the hash aggregation is born.
3.3. Analysis:
Take a look at the two simple queries.
ShipCountry and CustomerID Group queries look similar, but why are the execution plans different? This is because ShipCountry contains a large number of duplicate values, with very few CustomerID duplicates, so the SQL Server system gives ShipCountry the hash aggregation pushed, and CustomerID pushes the stream aggregation. This means that the SQL Server system dynamically chooses the appropriate aggregation method based on the query's situation. So when we do SQL optimization, we can't just optimize it based on SQL statements, we have to combine the environment of the specific data distribution.
v4. Operation Process Monitoring indicator
4.1. Monitoring elements:
- Visualize the elapsed time of the view
- T-SQL statement query time
- Memory consumption
- T-SQL statement query IO
4.2. Visualize the time of operation:
4.3.t-sql Statement Query time:
4.4. Memory consumption:
4.5.t-sql statement Query IO:
SQL Server aggregation function algorithm optimization techniques