Summary of SQL Server union operators

Source: Internet
Author: User
Tags create index hash logical operators sql server union

Related reading: SQL Server common operators learning summary tutorial

Technical preparation

Based on SQL Server2008R2, Microsoft uses a simpler case Library (Northwind) for parsing.

I. Union operators

The so-called UNION operator is actually the most widely used two types: union all and UNION.

These two operators are easy to use, the former is to merge the results of the two datasets, the latter is to merge after deduplication operation, if there is a write T-SQL statement code is no stranger.

Let's analyze the display of these two operators in the execution plan. For example:

SELECT FirstName + N' + LastName, City, Country FROM Employees

UNION ALL

SELECT ContactName, City, Country FROM MERS

 

 

This is the icon above. This is the icon of the union all operator.

 

 

This union operator is a simple operation. After scanning the two data sets, the union operator summarizes the results.

Let's take a look at the UNION operator. The example is as follows:

Select City, Country from Employees

UNION

SELECT City, Country FROM MERS

 

 

We can see that the UNION operator performs a Distinct Sort sorting operation after the concatenation operator. After this operation, the duplicate values in the result set are removed.

We have always stressed that sorting large data tables is a resource-consuming action!

So here we have found an optimized option, removing the sorting, or changing the sorting method.

The replacement of the Distinct Sort sorting operation is ha-order aggregation. The memory required by the Distinct Sort sorting operation is proportional to the data volume in the data set before deduplication, while the memory required by hash aggregation is proportional to the result integration after deduplication!

Therefore, if there are many duplicate values in the data row, the memory consumed by hash aggregation will be less.

Let's give an example.

Select ShipCountry from Orders

UNION

SELECT ShipCountry FROM Orders

This example is actually useless. Here we are trying to demonstrate it. Let's take a look at the results.

 

 

We know that the ShipCountry in this table has a large number of duplicate values, so it is the best way to use hash matching to perform deduplication.

In fact, there is a more lightweight deduplication connection method than hash matching connections: merge connections

I have analyzed this connection method in the previous article and used it to connect two datasets. It is actually similar here. You must first sort the original result set before the application!

We know that the optimization method can use index creation to increase the sorting speed.

Let's reproduce this deduplication method. We create a new table and then create an index. The code is as follows:

 

-- Create a table

SELECT EmployeeID, FirstName + N' + LastName AS ContactName, City, Country

INTO NewEmployees

FROM Employees

GO

-- Add an index

Alter table NewEmployees add constraint PK_NewEmployees primary key (EmployeeID)

Create index ContactName ON NewEmployees (ContactName)

Create index ContactName on mers (ContactName)

GO

-- Create a new query. Only the displayed Order by must be added to deduplicate merged connections.

SELECT ContactName FROM NewEmployees

UNION ALL

SELECT ContactName FROM MERS

Order by ContactName

 

 

 

We use index scanning to avoid explicit sorting operations.

We change union all to UNION. This operation will deduplicate the two datasets.

-- Create a new query. Only the displayed Order by must be added to deduplicate merged connections.

SELECT ContactName FROM NewEmployees

UNION

SELECT ContactName FROM MERS

Order by ContactName

 

 

Here we know that the UNION operation will de-duplicate the results. The above applies the stream aggregation operation. The stream aggregation operation is generally used in group operations. Of course, we use it to perform group deduplication.

In our actual application environment, the most common method is to merge connections. However, there is a situation that best suits hash connections, that is, a small table and a large table for joint operations, this is especially suitable for large tables with a large number of duplicate values.

Hash algorithms are really good!

References

Logical operators and physical operators in Microsoft books online

Reference books: SQL. Server.2005. technical insider series

Conclusion

This article is a brief introduction to the union operators in the query plan. Next we will analyze the parallel operations in SQL Server, let's take a look at how SQL Server uses parallel operations to maximize the use of existing hardware resources to improve performance. If you are interested, please pay attention in advance. The performance tuning of SQL Server involves a wide range of content, the analysis will be carried out in subsequent articles in sequence.

Once the SQL Server software goes deep, you will find that it is really very deep and can be described in depth. If you want to study performance tuning, let's take a look at this series. Let's study it together!

Many people have misunderstandings about SQL Server, or their opinions need to be corrected, I have met customers before. When I talk about Shenma SQL Server importing more data, it will go down ....

Shenma SQL Server can only be used for applications with small data volumes... Shenma is inferior to Oracle cloud ....!!!

Some children's shoes simply think that SQL Server is a pediatrics, and there is no technical content... very simple ....

I don't want to talk about these ideas. I just want those who really know SQL Server to prove something for SQL Server.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.