Cross join usage in T-SQL (semi-translation), t-sqlcross

Source: Internet
Author: User
Tags sql server query

Cross join usage in T-SQL (semi-translation), t-sqlcross

Suddenly I found a very hanging link. Let's take a look at what the database should do. be timid and cautious: DBA work content !!!!


Today I will translate an article on T-SQL, this article for Microsoft certification 70-461: QueryingMicrosoft SQL Server 2012 learning and practice. This article focuses on translation and introduces some thoughts in my work. For details, see the last part.

I will try to take the time to translate other articles in this series (see the source of the original article, this is a series of articles). In addition to reviewing some knowledge, I will focus on summing up my own income.

 

----------------------------------------------------------------------------

Source: http://www.sqlservercentral.com/articles/Stairway+Series/119933/

 

Cross join introduction:

The cross join operator is used to combine the data in a dataset with the data in another dataset. The result of combining two datasets using cross join is also called Cartesian product. The simplest cross join example:

SELECT * FROM A CROSS JOIN B

Note that when cross join is used, you do not need to add a JOIN field to the two tables, that is, the ON clause, as in INNER/outer join.

In addition, cross join can generate very large datasets. For example, if Table A in the preceding example has 10 rows and table B has 3 rows, then there will be 30 rows after the CROSS JOIN operation between the two tables. If table A has 10 million rows and table B has 3 million rows, there will be 30000000000000 rows, that is, 30 trillion rows. This may consume all SQL Server resources to generate result sets. Therefore, you must be careful when using CROSS JOIN to avoid unnecessary data.


Here are some examples to illustrate cross join.

 

Basic example of cross join:

In this example, we associate two simple tables. The following is a script. Make sure that the script runs in TempDB (note: the original text also requires that it is not created in the master database, but based on various considerations, in TempDB, we recommend that you restart SQL Server to restore operations without affecting the operations of other databases ).

USE tempdbgoCREATE TABLE Product(ID int,                      ProductNamevarchar(100),                      Costmoney);CREATE TABLE SalesItem(ID int,                        SalesDate datetime,                        ProductID int,                        Qty int,                        TotalSalesAmt money);INSERT INTO Product    VALUES (1,'Widget',21.99),          (2,'Thingamajig',5.38),          (3,'Watchamacallit',1.96);INSERT INTO SalesItem    VALUES (1,'2014-10-1',1,1,21.99),          (2,'2014-10-2',3,1,1.96),          (3,'2014-10-3',3,10,19.60),          (4,'2014-10-3',1,2,43.98),           (5,'2014-10-3',1,2,43.98);


 

Case 1: simple table CROSSJOIN:

SELECT * FROM Product CROSS JOIN SalesItem;

The result is as follows:

 


Check the above results and we can see that there are 15 rows of different records. Note that each column is different. The first five rows of records contain the first row from the Product table and are associated with the five rows in the SalesItem table. In the same sense, the rows 2 and 3 of the Product table are associated with the five rows of the SalesItem table. The final result is 3*5 = 15 rows.

One of the common examples of using dikaer products is to create test data. For example, if you want to create a Product list with different Product numbers from the Product and SalesItem tables, you can use cross join, as shown in the following code:


SELECT ROW_NUMBER() OVER(ORDER BY ProductName DESC) AS ID,       Product.ProductName       +CAST(SalesItem.ID as varchar(2)) AS ProductName,       (Product.Cost / SalesItem.ID) * 100 AS CostFROM Product CROSS JOIN SalesItem;


 

The result is as follows:

 

From the results, we can see that the ROW_NUMBER function generates a unique ID for each row. At the same time, a new ProductName and Cost columns are generated by combining SalesItem and ID columns.


So far, two tables in the example are CROSSJOIN, and cross join can be used for multi-Table operations, as shown below:


SELECT * FROM sys.tablesCROSS JOIN sys.objectsCROSS JOIN sys.sysusers;


 

In this example, three tables are used for the flute product. Because there are many result set columns, the results are not pasted. You can view the results by yourself.

 

When can cross join be equivalent to inner join?

As mentioned above, the cross join operation will generate a flute product. This is not always the case. When you use the WHERE clause in cross join, SQL Server will not generate a flute product. Instead, a normal JOIN operation is used. As follows:


SELECT * FROM Product P CROSS JOIN SalesItem SWHERE P.ID = S.ProductID; SELECT * FROM Product P INNER JOIN SalesItem SON P.ID = S.ProductID;


In the code, the first two SELECT statements use cross join with the WHERE clause, and the second SELECT statement uses the regular inner join with the ON clause. The SQL Server Query Optimizer analyzes the first statement and overwrites it to the second statement. Because it knows that when a cross join with a WHERE clause appears, it can be rewritten to the second statement, so opening the execution plan of the two can see the same result. However, if the WHERE clause is not used, SQL Server does not know how to associate the two tables of CROSS JOIN, so it can only generate the flute product.

 

Use cross join to find unsold products

The following example is used to help understand cross join operations and demonstrate how to use CROSSJOIN. One common scenario is to use the cross join operation to find "records that do not match the other table ". For example, suppose I want to query the total quantity and sales quantity of each Product in the Product table every day. Because not all products sell at least one product every day, there may be products that are not sold. For unsold products, use 0 as the quantity, and $0 as the sales. In this case, you can use the cross join combination left outer join for identification, as shown below:


SELECT S1.SalesDate, ProductName     , ISNULL(Sum(S2.Qty),0) AS TotalQty         , ISNULL(SUM(S2.TotalSalesAmt),0) AS TotalSalesFROM Product PCROSS JOIN  (SELECT DISTINCT SalesDate FROM SalesItem  ) S1LEFT OUTER JOIN SalesItem S2ON P.ID = S2.ProductIDAND S1.SalesDate = S2.SalesDateGROUP BY S1.SalesDate, P.ProductNameORDER BY S1.SalesDate;


 

Next, let's take a look at this code. First, create a subquery to query all the unique SalesData values. This subquery obtains all sales dates. Then, perform the cross join operation between the subquery and the Product table to generate a flute Product for each SalesDate and Product. This result set returns the final results except the Qty and TotalSalesAmt of each sold product. To obtain these summary values, you need to use left outer join to associate the SalesItem table with the Cartesian product generated by the previous cross join operation. The join conditions for the two result sets are the ProductID and SalesDate columns. By using left outer join, the values of SalesDate, Qty, and TotalSalesAmt that match ProductID and SalesDate are returned for each row in the Cartesian unit. The last step is to use the group by clause to summarize the Qty and TotalSalesAmount values based on SalesDate and ProductName.

Performance Considerations

Cross join operations may cause certain performance risks due to the formation of the Cartesian product. Because the SQL engine needs to match each row of the two tables one by one and generate new data, this result set may be very huge. If a table with 1 million rows joins a table with 0.1 million rows, the result set is 1000000*100000 = 100000000000 (1 trillion !). SQL Server requires a lot of time and resources to create this result set.

However, cross join can be used to analyze the possible results of two result sets, for example, all the sales of all the customers in each month, even if some customers do not have sales records in some months, you can also query the corresponding situation. When using the cross join operation, minimize the source data and result set of the cross join operation as much as possible to make the performance controllable. Assume that a table stores the sales records of the last two months. If I need to generate a report about which customers did not purchase records in the last month, I can use CROSS JOIN. The following is a demonstration:


Create two tables with 1000 customers and store the sales information for two months.


CREATE TABLECust (Id int, CustName varchar(20));CREATE TABLESales (Id int identity                   ,CustID int                                      ,SaleDate date                                      ,SalesAmt money);SET NOCOUNT ON;DECLARE @I int = 0;DECLARE @Date date;WHILE @I < 1000BEGIN          SET @I = @I + 1;        SET @Date = DATEADD(mm, -2, '2014-11-01');        INSERT INTO Cust        VALUES (@I,                'Customer #' + right(cast(@I+100000 as varchar(6)),5));        WHILE @Date < '2014-11-01'        BEGIN               IF @I%7 > 0                       INSERT INTO Sales (CustID, SaleDate, SalesAmt)                       VALUES (@I, @Date, 10.00);               SET @Date = DATEADD(DD, 1, @Date);        ENDEND


 

Add sales records to the seventh customer in the code. To demonstrate the volume problem caused by CROSSJOIN based on the result set, execute the following two codes:


SELECT CONVERT(CHAR(6),S1.SaleDate,112) AS SalesMonth, C.CustName,       ISNULL(SUM(S2.SalesAmt),0) AS TotalSalesFROM Cust CCROSS JOIN (SELECT SaleDate FROM Sales) AS S1LEFT OUTER JOINSales  S2ON C.ID = S2.CustIDAND S1.SaleDate = S2.SaleDateGROUP BY CONVERT(CHAR(6),S1.SaleDate,112),C.CustNameHAVING ISNULL(SUM(S2.SalesAmt),0) = 0ORDER BY CONVERT(CHAR(6),S1.SaleDate,112),C.CustName SELECT CONVERT(CHAR(6),S1.SaleDate,112) AS SalesMonth, C.CustName,       ISNULL(SUM(S2.SalesAmt),0) AS TotalSalesFROM Cust CCROSS JOIN (SELECT DISTINCT SaleDate FROMSales) AS S1LEFT OUTER JOINSales  S2ON C.ID = S2.CustIDAND S1.SaleDate = S2.SaleDateGROUP BY CONVERT(CHAR(6),S1.SaleDate,112),C.CustNameHAVING ISNULL(SUM(S2.SalesAmt),0) = 0ORDER BY CONVERT(CHAR(6),S1.SaleDate,112),C.CustName


 

In the first query, 1000 customers and 52338 sales records generate 52338000 rows of result sets. In the second query, only the data with a unique SalesData value is entered into the Cartesian product. In this case, only 61 rows of SalesData are generated. Therefore, the result set after cross join is only 61000 rows. By controlling the input size of cross join, the second query takes only one second, and the first query takes 19 seconds on the local machine. It can be seen that the size of the input set has a significant impact on performance. Check the execution plans of the two to see a significant difference.

Therefore, when writing cross join, you need to consider the size of the input set to avoid performance problems.

 

Conclusion:

The cross join operation performs the flute product operation in two record sets. This operation is useful when checking that "one table exists but not another table. However, you need to be very careful about the size of the input set used for cross join. Control the input set of cross join to the necessary scale to ensure operational efficiency.



----------------------------------------------------------------------------

By convention, you always have to talk about your own opinions. After all, it is published in original form and cannot be translated simply.

First, the subject of this article is translation, but not full translation. You can view the original text on your own.

Second, I have used cross join for so many years, because I know the problems it brings. For a long time, the only purpose of using this function is to generate a large result set, which is mainly used for testing. For example, when I want to test the writing and indexing performance of 0.1 billion rows of data, I will use a 10000 rows table to JOIN each other, and 0.1 billion rows of data will come out, and then I will do what I want.

Generally speaking, I do not recommend using CROSS JOIN in the formal environment, because this is unreasonable in most cases, but some systems are indeed seen during work, to generate some results (such as reports), you need to associate two tables that are not correlated, so that you can only use cross join. Of course, if possible, try to associate the two tables or switch to some data warehouse technologies.

Finally, we should look at the essence of the phenomenon. The difference between ON and cross join without ON is mentioned in this article. I have seen some people in the Forum say that INNER JOIN is the first CROSS JOIN before filtering, I thought it was impossible at the time, but there was no specific evidence to say anything. Relational databases have been around for nearly 40 years. At present, there is no much difference between RDBMS today. More is the performance of repairing, completing, and improving a little bit. Most of the time, we can't take it for granted. Instead, we should use actual operations to practice our own ideas. For example, in the past, if I used the examples mentioned above, I could tell that person, inner join is not simply a cross join operation before filtering. Otherwise, it is too anti-virus.

Do readers think about why this article exists? In fact, cross join is rarely used and dangerous. It is rarely used in real work. But it happened to have appeared in Microsoft certification (you should not be black first, think about why, cross join does not exist in SQL Server ). As mentioned above, in addition to generating test data, in many cases (this has also occurred during Forum reply), some business needs must indeed be generated using cross join, although most of these situations can be achieved by changing the design, the reality is that few systems have sufficient preliminary design, most of which are post-repair and makeup. Therefore, from this usage, we should think more about how to reduce these situations through design. The design requires knowledge and experience, and the reader (including himself) should also consider it and summarize it, and record it so that it can be used when design is needed one day.

In the past, I would only remember that cross join Associates each row of A/B tables, but I have never thought about the impact of the Order on both sides of CROSS JOIN. The focus is on the display of the result set. The table on the left will appear at the beginning, and each data on the left matches each row on the right, that is to say, it is the same row of N left tables and the separate row of the right table respectively. If the result is the inverse, then the order of the two sides of cross join will be reversed.

Speaking of this, there is nothing to add for the moment, and I will try again later. I try to record my thoughts and gains through each article. If an article does not leave anything for myself, it is meaningless to leave it for others. After all, you cannot predict the number of readers, but you know that there must be one reader-you are yourself. I hope that you will be able to recognize your ideas while learning through the article.


Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.