Begin
A while ago, there was an SQL query requirement in the project, with two tables of the same structure (Table_left & Table_right), as follows:
Figure 1.
Examine the groups (GROUPID) of the table table_left, and whether there is a set of (GROUPID) data in the table table_right that is exactly the same as its data.
As shown in Figure 1. You can see that there are two sets of data complete equality for table Table_left and Table_right:
Figure 2.
Analysis
From the two tables above, it is possible to know that they are storing a set of data; Then I use the enumeration of mathematical sets and operations to analyze.
The data of each group of two tables is described by the enumeration method of the collection first:
Figure 3.
There are only two cases where equality and inequality are the same. For inequality, it can be subdivided into partially equal, contained, and completely unequal. Using the set description, you can use intersections, subsets, and sets. As shown in Figure 4 below, I enumerate these common situations:
Figure 4.
Realize
In the database, there are a lot of ways to find the groups with the same data in table table_left and table Table_right, and here I list two common methods.
(The following SQL script is based on the data in Figure 4.)
Method 1:
Through the Select ... From ... The order by ... xml for Path (")" Links the data columns of each group (for example, Figure 4) by connecting the columns of the Table_left group #11 to "Data1-data2-data3"), and other groupings (including tables Table_ right) in this way to implement the data column string, and then by comparing the two tables after the string field is equal, if the equality indicates that this compares two sets of data equality, which can determine the table table_left which set of data in the table Table_ Right exists a group that is exactly equal to its data.
For method 1, you need to add a field datapath to the original table to store the results of the data column string, such as:
Copy Code code as follows:
ALTER TABLE table_left add datapath nvarchar (200)
ALTER TABLE table_right add datapath nvarchar (200)
Group the data columns and update to the newly added column datapath, such as:
Copy Code code as follows:
Update a
Set Datapath=b.datapath
From Table_left A
Cross Apply (SELECT '-' +x.data from Table_left x where x.groupid=a.groupid order by X.data for XML Path (') as Dat Apath) b
Update a
Set Datapath=b.datapath
From Table_right A
Cross Apply (SELECT '-' +x.data from Table_right x where x.groupid=a.groupid order by X.data for XML Path (') as Da Tapath) b
The following is the query, such as:
Copy Code code as follows:
SELECT DISTINCT A.groupid
From Table_left A
where exists (select 1 from Table_right x where X.datapath=a.datapath)
Complete code:
Copy Code code as follows:
View Code
Use tempdb
Go
If object_id (' Table_left ') is not null drop table Table_left
If object_id (' Table_right ') is not null drop table Table_right
Go
CREATE table Table_left (groupId nvarchar (5), Data nvarchar (10))
CREATE table Table_right (groupId nvarchar (5), Data nvarchar (10))
Go
ALTER TABLE table_left add datapath nvarchar (200)
ALTER TABLE table_right add datapath nvarchar (200)
Go
Create nonclustered index ix_left on table_left (datapath)
Create nonclustered index ix_right on table_right (datapath)
Go
SET NOCOUNT ON
Go
Insert into Table_right (groupid,data)
Select ' #1 ', ' data1 ' UNION ALL
Select ' #1 ', ' data2 ' UNION ALL
Select ' #1 ', ' data3 ' UNION ALL
Select ' #2 ', ' Data55 ' UNION ALL
Select ' #2 ', ' Data55 ' UNION ALL
Select ' #3 ', ' data91 ' UNION ALL
Select ' #3 ', ' data92 ' UNION ALL
Select ' #4 ', ' data65 ' UNION ALL
Select ' #4 ', ' Data66 ' UNION ALL
Select ' #4 ', ' data67 ' UNION ALL
Select ' #4 ', ' Data68 ' UNION ALL
Select ' #4 ', ' data69 ' UNION ALL
Select ' #5 ', ' data77 ' UNION ALL
Select ' #5 ', ' data79 '
Insert into Table_left (groupid,data)
Select ' #11 ', ' data1 ' UNION ALL
Select ' #11 ', ' data2 ' UNION ALL
Select ' #11 ', ' data3 ' UNION ALL
Select ' #22 ', ' Data55 ' UNION ALL
Select ' #22 ', ' data57 ' UNION ALL
Select ' #33 ', ' data99 ' UNION ALL
Select ' #33 ', ' data99 ' UNION ALL
Select ' #44 ', ' Data66 ' UNION ALL
Select ' #44 ', ' Data68 ' UNION ALL
Select ' #55 ', ' data77 ' UNION ALL
Select ' #55 ', ' data78 ' UNION ALL
Select ' #55 ', ' data79 '
Go
Update a
Set Datapath=b.datapath
From Table_left A
Cross Apply (SELECT '-' +x.data from Table_left x where x.groupid=a.groupid order by X.data for XML Path (') as Dat Apath) b
Update a
Set Datapath=b.datapath
From Table_right A
Cross Apply (SELECT '-' +x.data from Table_right x where x.groupid=a.groupid order by X.data for XML Path (') as Da Tapath) b
--
SELECT DISTINCT A.groupid
From Table_left A
where exists (select 1 from Table_right x where X.datapath=a.datapath)
Method 2:
Two sets of data that are not duplicated are judged by the set operator "Except" provided by SQL Sever. If both groups do not have duplicate data for each other, the two sets of data are exactly equal. For example, group #11 in table Table_left and group # in table Table_right, "Except" Set operations on column data, without either (#11 à #1) for Except set operations, or (#1 à #11) The except set operation returns null results, which shows that the data of the group # and #11 are exactly equal, such as:
Copy Code code as follows:
Select data from Table_left where groupid= ' #11 ' except select data from Table_right where groupid= ' #1 '
Select data from table_right where groupid= ' #1 ' except select data from Table_left where groupid= ' #11 '
In the same way, we put the group #11 in the table Table_left and the group in the table Table_right, to the column data "Except" set operations, such as:
Copy Code code as follows:
Select data from Table_left where groupid= ' #11 ' except select data from Table_right where groupid= ' #2 '
Select data from table_right where groupid= ' #2 ' except select data from Table_left where groupid= ' #11 '
As long as the result of the "Except" set of (#11 à #2) or (#2 à #11) is recorded, the data for the two groups is not equal.
All groups of the two tables are compared, and we need to implement them through the following SQL scripts, such as:
Copy Code code as follows:
SELECT DISTINCT A.groupid
From Table_left A
INNER JOIN Table_right B on B.data=a.data
Where NOT EXISTS (select X.data to Table_left x where x.groupid=a.groupid except select Y.data from Table_right y where y . Groupid=b.groupid)
And NOT EXISTS (select X.data to Table_right x where x.groupid=b.groupid except select Y.data from Table_left y where y.g ROUPID=A.GROUPID)
Complete code:
Copy Code code as follows:
View Code
Use tempdb
Go
If object_id (' Table_left ') is not null drop table Table_left
If object_id (' Table_right ') is not null drop table Table_right
Go
CREATE table Table_left (groupId nvarchar (5), Data nvarchar (10))
CREATE table Table_right (groupId nvarchar (5), Data nvarchar (10))
Go
Create nonclustered index ix_left on table_left (data)
Create nonclustered index ix_right on table_right (data)
Go
SET NOCOUNT ON
Go
Insert into Table_right (groupid,data)
Select ' #1 ', ' data1 ' UNION ALL
Select ' #1 ', ' data2 ' UNION ALL
Select ' #1 ', ' data3 ' UNION ALL
Select ' #2 ', ' Data55 ' UNION ALL
Select ' #2 ', ' Data55 ' UNION ALL
Select ' #3 ', ' data91 ' UNION ALL
Select ' #3 ', ' data92 ' UNION ALL
Select ' #4 ', ' data65 ' UNION ALL
Select ' #4 ', ' Data66 ' UNION ALL
Select ' #4 ', ' data67 ' UNION ALL
Select ' #4 ', ' Data68 ' UNION ALL
Select ' #4 ', ' data69 ' UNION ALL
Select ' #5 ', ' data77 ' UNION ALL
Select ' #5 ', ' data79 '
Insert into Table_left (groupid,data)
Select ' #11 ', ' data1 ' UNION ALL
Select ' #11 ', ' data2 ' UNION ALL
Select ' #11 ', ' data3 ' UNION ALL
Select ' #22 ', ' Data55 ' UNION ALL
Select ' #22 ', ' data57 ' UNION ALL
Select ' #33 ', ' data99 ' UNION ALL
Select ' #33 ', ' data99 ' UNION ALL
Select ' #44 ', ' Data66 ' UNION ALL
Select ' #44 ', ' Data68 ' UNION ALL
Select ' #55 ', ' data77 ' UNION ALL
Select ' #55 ', ' data78 ' UNION ALL
Select ' #55 ', ' data79 '
Go
--select
SELECT DISTINCT A.groupid
From Table_left A
INNER JOIN Table_right B on B.data=a.data
Where NOT EXISTS (select X.data to Table_left x where x.groupid=a.groupid except select Y.data from Table_right y where y . Groupid=b.groupid)
And NOT EXISTS (select X.data to Table_right x where x.groupid=b.groupid except select Y.data from Table_left y where y.g ROUPID=A.GROUPID)
Method 1 Vs. Method 2:
Both Method 1 and Method 2 can be used to find the group #11 in which the table Table_left exists exactly equal in table_right data. However, in terms of performance, Method 2 is more slightly than Method 1 to see the statistical information of their execution process:
Method 1:
Figure 5.
Method 2:
Figure 6.
If the data volume is large, then method 2 has more obvious advantages than Method 1. Because of Method 1, more than two update datapath parts, the amount of data as the increase, where the location of the update consumes a lot of resources; if the datapath column data size exceeds 900 bytes, it causes the index to not be created in datapath, affecting the subsequent select query performance.
Extended
Here's the extension, mainly for approach 2 above. When the data in the column is larger than 900 bytes, or contains multiple data columns to compare, see if there are two groups (groupId) of each corresponding column data one by one equal.
Figure 7.
In this case, you can create a hash index for the field DataSub1 & DataSub2, such as:
Copy Code code as follows:
ALTER TABLE Table_left add Datachecksum as checksum (DATASUB1,DATASUB2)
ALTER TABLE Table_right add Datachecksum as checksum (DATASUB1,DATASUB2)
Go
Create nonclustered index Ix_table_left_cs on table_right (datachecksum)
Create nonclustered index Table_right_cs on table_right (datachecksum)
The following SELECT query statement is slightly altered in the inner Join section, such as:
Copy Code code as follows:
SELECT DISTINCT A.groupid
From Table_left A
INNER JOIN Table_right B on b.datachecksum=a.datachecksum
and B.datasub1=a.datasub1
and B.DATASUB2=A.DATASUB2
Where NOT EXISTS (select X.datasub1,x.datasub2 from Table_left x where x.groupid=a.groupid except select Y.datasub1,y.data Sub2 from Table_right y where y.groupid=b.groupid)
And NOT EXISTS (select X.datasub1,x.datasub2 from Table_right x where x.groupid=b.groupid except select Y.datasub1,y.datas Ub2 from Table_left y where y.groupid=a.groupid)
Complete code:
Copy Code code as follows:
View Code
Use tempdb
Go
If object_id (' Table_left ') is not null drop table Table_left
If object_id (' Table_right ') is not null drop table Table_right
Go
CREATE table Table_left (groupId nvarchar (5), DataSub1 nvarchar (ten), DataSub2 nvarchar (10))
CREATE table Table_right (groupId nvarchar (5), DataSub1 nvarchar (ten), DataSub2 nvarchar (10))
Go
ALTER TABLE Table_left add Datachecksum as checksum (DATASUB1,DATASUB2)
ALTER TABLE Table_right add Datachecksum as checksum (DATASUB1,DATASUB2)
Go
Create nonclustered index Ix_table_left_cs on table_left (datachecksum)
Create nonclustered index Table_right_cs on table_right (datachecksum)
Go
SET NOCOUNT ON
Go
Insert into Table_right (GROUPID,DATASUB1,DATASUB2)
Select ' #1 ', ' data1 ', ' data7 ' UNION ALL
Select ' #1 ', ' data2 ', ' data8 ' UNION ALL
Select ' #1 ', ' data3 ', ' data9 ' UNION ALL
Select ' #2 ', ' data55 ', ' data4 ' UNION ALL
Select ' #2 ', ' data55 ', ' data5 '
Insert into Table_left (GROUPID,DATASUB1,DATASUB2)
Select ' #11 ', ' data1 ', ' data7 ' UNION ALL
Select ' #11 ', ' data2 ', ' data8 ' UNION ALL
Select ' #11 ', ' data3 ', ' data9 ' UNION ALL
Select ' #22 ', ' data55 ', ' data0 ' UNION ALL
Select ' #22 ', ' data57 ', ' data2 ' UNION ALL
Select ' #33 ', ' data99 ', ' data4 ' UNION ALL
Select ' #33 ', ' data99 ', ' data6 '
Go
--select
SELECT DISTINCT A.groupid
From Table_left A
INNER JOIN Table_right B on b.datachecksum=a.datachecksum
and B.datasub1=a.datasub1
and B.DATASUB2=A.DATASUB2
Where NOT EXISTS (select X.datasub1,x.datasub2 from Table_left x where x.groupid=a.groupid except select Y.datasub1,y.data Sub2 from Table_right y where y.groupid=b.groupid)
And NOT EXISTS (select X.datasub1,x.datasub2 from Table_right x where x.groupid=b.groupid except select Y.datasub1,y.datas Ub2 from Table_left y where y.groupid=a.groupid)
Summary
There may be other or better solutions to this problem. And in the actual production environment, may encounter the situation will be different, anyway, need more analysis, more hands-on experiments, find the best solution.