How to use four statements to improve SQL Server scalability

Source: Internet
Author: User
Tags server memory

This article explains how to use left join, cross join, and IDENTITY value retrieval to Improve the Performance of SQL Server-based applications or improve their scalability.

What you will encounter: SQL queries in applications cannot respond in the way you want. It either does not return data or takes a surprising amount of time. If it reduces the speed of enterprise applications, users must wait for a long time. Users want their applications to respond quickly and their reports can return analysis data instantly.

To solve these problems, it is important to find the root cause of the problem. So where should we start? The root cause is generally the database design and access to its query. I will describe four technologies that can be used to improve the performance or scalability of SQL Server-based applications. I will carefully describe the use of LEFT JOIN and CROSS JOIN and the retrieval of IDENTITY values. Remember, there is no magic solution at all. Adjusting your database and its query takes time and analysis, and requires a lot of tests. These technologies have proven to be effective, but some of them may be more suitable for your applications than others.

Return IDENTITY from INSERT

I decided to start with a lot of problems: how to retrieve the IDENTITY value after executing SQL INSERT. Generally, the question is not how to write a query value, but where and when to search. In SQL Server, the following statement can be used to retrieve the IDENTITY value created by the latest SQL statement running on the active database connection: SELECT @ IDENTITY.

This SQL statement is not complex, but it should be remembered that if the latest SQL statement is not INSERT, or you have run this SQL statement for other connections other than insert SQL, the expected value is not obtained. You must run the following code to retrieve the IDENTITY that follows the insert SQL statement and is located on the same connection, as shown below:


Insert into Products (ProductName) VALUES ('chalk ')
SELECT @ IDENTITY

Running These queries for the Northwind database on a connection will return the IDENTITY value of a new product named Chalk. Therefore, you can run the following statement in the ADOVisual Basic application:


Set oRs = oCn. Execute ("set nocount on; insert into Products _
(ProductName) VALUES ('chalk '); SELECT @ IDENTITY ")
LProductID = oRs (0)

This code tells SQL Server not to return the row count of the query, then execute the INSERT statement, and return the IDENTITY value created for the new row. The set nocount on statement indicates that the returned record SET contains one row and one column, which contains the new IDENTITY value. If this statement is not available, an empty record set is first returned (because the INSERT statement does not return any data), then the second record set is returned, and the second record set contains the IDENTITY value. This may be confusing, especially because you never expected INSERT to return a record set. This happens because SQL Server sees this row count (that is, one row is affected) and interprets it as a record set. Therefore, the real data is pushed back to the second record set. Of course, you can use the NextRecordset method in ADO to obtain the second record set. However, if you can always return the record set first and only return the record set, it is more convenient and efficient.

This method is valid, but additional code needs to be added to the SQL statement. Another way to get the same result is to use the set nocount on statement before INSERT, and place the SELECT @ IDENTITY statement in the for insert trigger in the table, as shown in the following code snippet. In this way, any INSERT statement entering the table will automatically return the IDENTITY value.


Create trigger trProducts_Insert ON Products FOR INSERT
SELECT @ IDENTITY
GO

The trigger is started only when INSERT is performed on the Products table. Therefore, it always returns an IDENTITY after successful INSERT. With this technology, you can always retrieve the IDENTITY value in the application in the same way.

  Embedded view and temporary table

In some cases, you need to connect data to other data that can only be collected BY executing group by and then Executing standard query. For example, to query the information about the latest five orders, you must first know which orders are used. This can be searched using the SQL query that returns the order ID. This data is stored in a temporary table (this is a common technology) and then joined with the Products table to return the number of Products sold in these orders:


Create table # Temp1 (OrderID int not null ,_
OrderDate datetime not null)
Insert into # Temp1 (OrderID, OrderDate)
Select top 5 o. OrderID, o. OrderDate
FROM Orders o order by o. OrderDate DESC
SELECT p. ProductName, SUM (od. Quantity) AS ProductQuantity
FROM # Temp1 t
Inner join [Order Details] od ON t. OrderID = od. OrderID
Inner join Products p ON od. ProductID = p. ProductID
Group by p. ProductName
Order by p. ProductName
Drop table # Temp1

These SQL statements create a temporary table, insert data into the table, join other data with the table, and then remove the temporary table. This results in a large number of I/O operations on this query. Therefore, you can rewrite the query and replace the temporary table with an embedded view. The nested view is just a query that can be joined to the FROM clause. Therefore, you do not need to spend a lot of I/O and disk access on temporary tables in tempdb, but you can use the embedded view to get the same result:


SELECT p. ProductName,
SUM (od. Quantity) AS ProductQuantity
FROM (
Select top 5 o. OrderID, o. OrderDate
FROM Orders o
Order by o. OrderDate DESC
) T
Inner join [Order Details] od ON t. OrderID = od. OrderID
Inner join Products p ON od. ProductID = p. ProductID
GROUP
P. ProductName
ORDER
P. ProductName

This query is not only more efficient than the previous query, but also shorter in length. Temporary tables consume a large amount of resources. If you only need to connect data to other queries, you can try using the embedded view to save resources.

  Avoid left join and NULL

Of course, you often need to execute left join and use NULL values. However, they do not apply to all situations. Changing the construction method of SQL queries may result in shortening a report that takes a few minutes to only a few seconds. Sometimes, you must adjust the data format in the query to adapt to the display method required by the application. Although the TABLE data type reduces resource usage, many areas can be optimized in the query. A valuable common function of SQL is LEFT JOIN. It can be used to retrieve all rows in the first table, all matched rows in the second table, and all rows in the second table that do not match the first table. For example, if you want to return each customer and their orders, use left join to display customers with and without orders.

This tool may be excessively used. Left join consumes a lot of resources because it contains data that matches NULL (nonexistent) data. In some cases, this is inevitable, but the cost may be very high. Left join consumes more resources than inner join, so if you can rewrite the query so that the query does not use any left join, you will get a considerable return.

One technique that accelerates query using left join involves creating a TABLE data type and inserting all rows in the first TABLE (the TABLE on the LEFT of left join, then, use the value in the second TABLE to update the TABLE data type. This technology is a two-step process, but it can save a lot of time compared with the standard left join. A good rule is to try different technologies and record the time required for each technology until you get the best query for the execution performance of your application.

When testing the query speed, it is necessary to run the query multiple times and then take an average value. Because queries (or stored procedures) may be stored in the SQL Server memory process cache, the first attempt may take a little longer, and all subsequent attempts will take a short time. In addition, you may be running other queries for the same table when running your query. When other queries lock and unlock these tables, your queries may have to wait in queue. For example, if someone is updating the data in this table when you perform a query, your query may take longer to execute during update submission.

The simplest way to avoid speed reduction when left join is to design databases around them as much as possible. For example, assume that a product may have a category or no category. If the Products table stores the category ID instead of the category of a specific product, you can store the NULL value in the field. Then you must execute left join to obtain all products and their categories. You can create a Category with a value of "No Category" to specify that the foreign key relationship does not allow NULL values. By performing the preceding operations, you can use inner join to retrieve all products and their categories. Although this seems to be a work und with excess data, it may be a very valuable technology because it can eliminate left join statements that consume more resources in SQL batch processing statements. Using this concept in databases can save a lot of processing time. Remember, even a few seconds is important for your users, because when many users are accessing the same online database application, the seconds actually mean a lot.

  Flexible Use of cartesian products

I will give a very detailed introduction to this technique and advocate the use of cartesian products in some cases. For some reason, cross join has been condemned and developers are often warned not to use them at all. In many cases, they consume too many resources and thus cannot be used efficiently. However, like any tool in SQL, they will be very valuable if used correctly. For example, if you want to run a query that returns monthly data (even if a customer has no orders for a specific month), you can easily use the Cartesian product.

Although this does not seem magical, consider if you perform standard inner join from the customer to the order (these orders are grouped by month and sales are subtotal, only the month in which the customer has an order is obtained. Therefore, you will not receive a value of 0 for the month when the customer has not subscribed to any product. If you want to draw a chart for each customer to show monthly and monthly sales, you may want this chart to include months with monthly sales of 0 to visually identify these months. If you use the SQL statement in Figure 2 (last page), the data skips the month with a sales volume of $0, because in the order table, zero sales does not contain any rows (assuming you only store the events that occur ).

Although the Code in Figure 3 (the last page) is long, it can achieve the goal of getting all sales data (or even months without sales. First, it extracts the list of all last year's months and puts them into the first TABLE data type TABLE (@ tblMonths. Next, this code retrieves the names of all customer companies that have sales within this period of time, and then puts them into another TABLE data type TABLE (@ tblCus-tomers. These two tables store all the basic data necessary to create a result set, except for the actual sales volume. The first table lists all months (12 rows), and the second table lists all customers with sales during this period (81 for me ). Not every customer has purchased the product every month in the past 12 months. Therefore, executing inner join or left join will not return each customer every month. These operations only return the customer and month of the purchased product.

The Cartesian product returns all customers of all months. Descartes multiply the first table and the second table to generate a row set, which contains the result of multiplying the number of rows in the first table and the number of rows in the second table. Therefore, the Cartesian Product Returns 972 rows to the Table @ tblFinal. The final step is to update the @ tblFinal table and select the final row set using the total monthly sales of each customer within this date range.

Cross join can be used with caution if there are many resources occupied by the Cartesian product without the real Cartesian product. For example, if cross join is performed on products and categories, and most rows are filtered using the WHERE clause, DISTINCT, or group by clause, the same result is obtained using inner join, and the efficiency is much higher. If you need to return data for all possibilities (for example, when you want to fill a chart with the monthly sales date), the Cartesian product may be very helpful. However, you should not use them for other purposes, because inner join is much more efficient in most scenarios.

 Zero missing

Here we will introduce other common technologies that can help improve SQL query efficiency. Assume that you group all sales people by region and subtotal their sales, but you only want those databases to be marked as active sales people. You can group sales people by region and use the HAVING clause to remove those who are not in the active status. You can also perform this operation in the WHERE clause. Executing this operation in the WHERE clause reduces the number of rows to be grouped, which is more efficient than executing this operation in the HAVING clause. Row-Based Filtering in the HAVING clause forces queries to group data that will be removed in the WHERE clause.

Another way to improve efficiency is to use the DISTINCT keyword to search for separate reports of data rows instead of using the group by clause. In this case, SQL statements using the DISTINCT keyword are more efficient. Use group by when you need to calculate Aggregate functions (SUM, COUNT, MAX, etc. In addition, if your query always returns a unique row, do not use the DISTINCT keyword. In this case, the DISTINCT keyword only increases the system overhead.

As you can see, there are a lot of technologies that can be used to optimize queries and implement specific business rules. The trick is to try and compare their performance. The most important thing is testing, testing, and re-testing.


Figure 2 Returning All MERs and Their Sales
Set nocount on
DECLARE @ dtStartDate DATETIME,
@ DtEndDate DATETIME,
@ DtDate DATETIME
SET @ dtEndDate = '2014/1/123'
SET @ dtEndDate = DATEADD (DD,-1,
CAST (MONTH (@ dtEndDate) + 1)
As varchar (2) + '/01/' +
CAST (YEAR (@ dtEndDate) as varchar (4) +'
23:59:59 'as datetime ))
SET @ dtStartDate = DATEADD (MM,-1*12, @ dtEndDate)
Select cast (YEAR (o. OrderDate) as varchar (4) + '-' +
CASE
When month (o. OrderDate) <10
THEN '0' + CAST (MONTH (o. OrderDate) as varchar (2 ))
Else cast (MONTH (o. OrderDate) as varchar (2 ))
End as sMonth,
C. CustomerID,
C. CompanyName,
C. ContactName,
SUM (od. Quantity * od. UnitPrice) AS mSales
FROM MERs c
Inner join Orders o ON
C. CustomerID = o. CustomerID
Inner join [Order Details]
Od ON o. OrderID = od. OrderID
WHERE o. OrderDate
@ DtStartDate AND @ dtEndDate
GROUP
CAST (YEAR (o. OrderDate) as varchar (4) + '-' +
CASE
When month (o. OrderDate) <10
THEN '0' + CAST (MONTH (o. OrderDate)

As varchar (2 ))
Else cast (MONTH (o. OrderDate)
As varchar (2 ))
END,
C. CustomerID,
C. CompanyName,
C. ContactName
ORDER
C. CompanyName,
SMonth
Figure 3 Cartesian Product at Work
DECLARE @ tblMonths
TABLE (sMonth VARCHAR (7 ))
DECLARE @ tblCustomers
TABLE (CustomerID CHAR (10 ),
CompanyName VARCHAR (50 ),
ContactName VARCHAR (50 ))
DECLARE @ tblFinal
TABLE (sMonth VARCHAR (7 ),
CustomerID CHAR (10 ),
CompanyName VARCHAR (50 ),
ContactName VARCHAR (50 ),
MSales MONEY)
DECLARE @ dtStartDate DATETIME,
@ DtEndDate DATETIME,
@ DtDate DATETIME,
@ I INTEGER
SET @ dtEndDate = '2014/1/123'
SET @ dtEndDate = DATEADD (DD,-1,
CAST (MONTH (@ dtEndDate) + 1)
VARCHAR (2) + '/01 /'
+ CAST (YEAR (@ dtEndDate) as varchar (4) +'
23:59:59 'as datetime ))
SET @ dtStartDate
= DATEADD (MM,-1*12, @ dtEndDate)
-Get all months into the first table
SET @ I = 0
WHILE (@ I <12)
BEGIN
SET @ dtDate = DATEADD (mm,-1 * @ I, @ dtEndDate)
Insert into @ tblMonths
Select cast (YEAR (@ dtDate) as varchar (4) + '-' +
CASE
When month (@ dtDate) <10
THEN '0' + CAST (MONTH (@ dtDate) as varchar (2 ))
Else cast (MONTH (@ dtDate) as varchar (2 ))
End as sMonth
SET @ I = @ I + 1
END

-Get all clients who had sales
During that period into the "y" table
Insert into @ tblCustomers
SELECT DISTINCT
C. CustomerID,
C. CompanyName,
C. ContactName
FROM MERs c
Inner join Orders o ON
C. CustomerID = o. CustomerID
WHERE o. OrderDate
@ DtStartDate AND @ dtEndDate
Insert into @ tblFinal
SELECT m. sMonth,
C. CustomerID,
C. CompanyName,
C. ContactName,
0
FROM @ tblMonths m cross join @ tblCustomers c
UPDATE @ tblFinal SET
MSales = mydata. mSales
FROM @ tblFinal f INNER JOIN
(
SELECT c. CustomerID,
CAST (YEAR (o. OrderDate)
As varchar (4) + '-' +
Case when month (o. OrderDate) <10
THEN '0' + CAST (MONTH (o. OrderDate)
As varchar (2 ))
Else cast (MONTH (o. OrderDate)
As varchar (2 ))
End as sMonth,
SUM (od. Quantity * od. UnitPrice)
AS mSales
FROM MERs c
Inner join Orders o ON
C. CustomerID = o. CustomerID
Inner join [Order Details] od ON
O. OrderID = od. OrderID
WHERE o. OrderDate
BETWEEN @ dtStartDate AND @ dtEndDate
GROUP
C. CustomerID,
CAST (YEAR (o. OrderDate)
As varchar (4) + '-' +
Case when month (o. OrderDate) <10
THEN '0' + CAST (MONTH (o. OrderDate) as varchar (2 ))
Else cast (MONTH (o. OrderDate) as varchar (2) END
) Mydata on f. CustomerID
= Mydata. CustomerID AND f. sMonth =
Mydata. sMonth
SELECT f. sMonth,
F. CustomerID,
F. CompanyName,
F. ContactName,
F. mSales
FROM @ tblFinal f
ORDER
F. CompanyName,
F. sMonth

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.