Related reading: SQL Server common operators learning summary tutorial
Technical preparation
Based on SQL Server2008R2, Microsoft uses a simpler case Library (Northwind) for parsing.
I. Union operators
The so-called UNION operator is actually the most widely used two types: union all and UNION.
These two operators are easy to use, the former is to merge the results of the two datasets, the latter is to merge after deduplication operation, if there is a write T-SQL statement code is no stranger.
Let's analyze the display of these two operators in the execution plan. For example:
SELECT FirstName + N' + LastName, City, Country FROM Employees
UNION ALL
SELECT ContactName, City, Country FROM MERS
This is the icon above. This is the icon of the union all operator.
This union operator is a simple operation. After scanning the two data sets, the union operator summarizes the results.
Let's take a look at the UNION operator. The example is as follows:
Select City, Country from Employees
UNION
SELECT City, Country FROM MERS
We can see that the UNION operator performs a Distinct Sort sorting operation after the concatenation operator. After this operation, the duplicate values in the result set are removed.
We have always stressed that sorting large data tables is a resource-consuming action!
So here we have found an optimized option, removing the sorting, or changing the sorting method.
The replacement of the Distinct Sort sorting operation is ha-order aggregation. The memory required by the Distinct Sort sorting operation is proportional to the data volume in the data set before deduplication, while the memory required by hash aggregation is proportional to the result integration after deduplication!
Therefore, if there are many duplicate values in the data row, the memory consumed by hash aggregation will be less.
Let's give an example.
Select ShipCountry from Orders
UNION
SELECT ShipCountry FROM Orders
This example is actually useless. Here we are trying to demonstrate it. Let's take a look at the results.
We know that the ShipCountry in this table has a large number of duplicate values, so it is the best way to use hash matching to perform deduplication.
In fact, there is a more lightweight deduplication connection method than hash matching connections: merge connections
I have analyzed this connection method in the previous article and used it to connect two datasets. It is actually similar here. You must first sort the original result set before the application!
We know that the optimization method can use index creation to increase the sorting speed.
Let's reproduce this deduplication method. We create a new table and then create an index. The code is as follows:
-- Create a table
SELECT EmployeeID, FirstName + N' + LastName AS ContactName, City, Country
INTO NewEmployees
FROM Employees
GO
-- Add an index
Alter table NewEmployees add constraint PK_NewEmployees primary key (EmployeeID)
Create index ContactName ON NewEmployees (ContactName)
Create index ContactName on mers (ContactName)
GO
-- Create a new query. Only the displayed Order by must be added to deduplicate merged connections.
SELECT ContactName FROM NewEmployees
UNION ALL
SELECT ContactName FROM MERS
Order by ContactName
We use index scanning to avoid explicit sorting operations.
We change union all to UNION. This operation will deduplicate the two datasets.
-- Create a new query. Only the displayed Order by must be added to deduplicate merged connections.
SELECT ContactName FROM NewEmployees
UNION
SELECT ContactName FROM MERS
Order by ContactName
Here we know that the UNION operation will de-duplicate the results. The above applies the stream aggregation operation. The stream aggregation operation is generally used in group operations. Of course, we use it to perform group deduplication.
In our actual application environment, the most common method is to merge connections. However, there is a situation that best suits hash connections, that is, a small table and a large table for joint operations, this is especially suitable for large tables with a large number of duplicate values.
Hash algorithms are really good!
References
Logical operators and physical operators in Microsoft books online
Reference books: SQL. Server.2005. technical insider series
Conclusion
This article is a brief introduction to the union operators in the query plan. Next we will analyze the parallel operations in SQL Server, let's take a look at how SQL Server uses parallel operations to maximize the use of existing hardware resources to improve performance. If you are interested, please pay attention in advance. The performance tuning of SQL Server involves a wide range of content, the analysis will be carried out in subsequent articles in sequence.
Once the SQL Server software goes deep, you will find that it is really very deep and can be described in depth. If you want to study performance tuning, let's take a look at this series. Let's study it together!
Many people have misunderstandings about SQL Server, or their opinions need to be corrected, I have met customers before. When I talk about Shenma SQL Server importing more data, it will go down ....
Shenma SQL Server can only be used for applications with small data volumes... Shenma is inferior to Oracle cloud ....!!!
Some children's shoes simply think that SQL Server is a pediatrics, and there is no technical content... very simple ....
I don't want to talk about these ideas. I just want those who really know SQL Server to prove something for SQL Server.