Five Ways to Improve SQL Performance

Source: Internet
Author: User

Released on: 4/1/2004 | updated on: 4/1/2004

Johnny Papa

Data Points Archive

Sometimes, in order to make the application run faster, all the work is to make small adjustments here or there. Ah, but the key is to determine how to adjust it! Sooner or later, you will encounter this situation: the SQL query in the application cannot respond as you want. It either does not return data or takes a surprising amount of time. If it reduces the speed of reporting or your enterprise applications, users will not be satisfied if they have to wait too long. Just as your parents don't want to explain why they came back in the middle of the night, they won't explain why the query took so long. ("Sorry, Mom, I have used too many LEFT JOIN statements .") Users want their applications to respond quickly and their reports can return analysis data instantly. As far as I am concerned, if it takes more than 10 seconds to load a page when surfing the Web (well, five seconds is more practical), I will also be impatient.

To solve these problems, it is important to find the root cause of the problem. So where should we start? The root cause is generally the database design and access to its query. In this month's column, I will talk about four technologies that can be used to improve the performance of SQL Server? Or improve the scalability of the application. I will carefully describe the use of LEFT JOIN and CROSS JOIN and the retrieval of IDENTITY values. Remember, there is no magic solution at all. Adjusting your database and its query takes time and analysis, and requires a lot of tests. These technologies have proven to be effective, but some of them may be more suitable for your applications than others.

Content on this page
Return IDENTITY from INSERT
"Hspace =" 4 "src =" http://www.microsoft.com/library/gallery/templates/MNP2.Common/images/arrow_px_down.gif "width =" 7 "vspace =" 2 "border =" 0 ">
Return IDENTITY from INSERT
Embedded view and temporary table
"Hspace =" 4 "src =" http://www.microsoft.com/library/gallery/templates/MNP2.Common/images/arrow_px_down.gif "width =" 7 "vspace =" 2 "border =" 0 ">
Embedded view and temporary table
Avoid left join and NULL
"Hspace =" 4 "src =" http://www.microsoft.com/library/gallery/templates/MNP2.Common/images/arrow_px_down.gif "width =" 7 "vspace =" 2 "border =" 0 ">
Avoid left join and NULL
Flexible Use of cartesian products
"Hspace =" 4 "src =" http://www.microsoft.com/library/gallery/templates/MNP2.Common/images/arrow_px_down.gif "width =" 7 "vspace =" 2 "border =" 0 ">
Flexible Use of cartesian products
Zero missing
"Hspace =" 4 "src =" http://www.microsoft.com/library/gallery/templates/MNP2.Common/images/arrow_px_down.gif "width =" 7 "vspace =" 2 "border =" 0 ">
Zero missing

Return IDENTITY from INSERT

I decided to start with a lot of problems: how to retrieve the IDENTITY value after executing SQL INSERT. Generally, the question is not how to write a query value, but where and when to search. In SQL Server, the following statement can be used to retrieve the IDENTITY value created by the latest SQL statement running on the active database connection:

SELECT @@IDENTITY

This SQL statement is not complex, but it should be remembered that if the latest SQL statement is not INSERT, or you have run this SQL statement for other connections other than insert SQL, the expected value is not obtained. You must run the following code to retrieve the IDENTITY that follows the insert SQL statement and is located on the same connection, as shown below:

INSERT INTO Products (ProductName) VALUES ('Chalk')SELECT @@IDENTITY

Running These queries for the Northwind database on a connection will return the IDENTITY value of a new product named Chalk. Therefore, when using Visual Basic of ADO? In the application, you can run the following statement:

Set oRs = oCn.Execute("SET NOCOUNT ON;INSERT INTO Products _(ProductName) VALUES ('Chalk');SELECT @@IDENTITY")lProductID = oRs(0)

This code tells SQL Server not to return the row count of the query, then execute the INSERT statement, and return the IDENTITY value created for the new row. The set nocount on statement indicates that the returned record SET contains one row and one column, which contains the new IDENTITY value. If this statement is not available, an empty record set is first returned (because the INSERT statement does not return any data), then the second record set is returned, and the second record set contains the IDENTITY value. This may be confusing, especially because you never expected INSERT to return a record set. This happens because SQL Server sees this row count (that is, one row is affected) and interprets it as a record set. Therefore, the real data is pushed back to the second record set. Of course, you can use the NextRecordset method in ADO to obtain the second record set. However, if you can always return the record set first and only return the record set, it is more convenient and efficient.

This method is valid, but additional code needs to be added to the SQL statement. Another way to get the same result is to use the set nocount on statement before INSERT, and place the SELECT @ IDENTITY statement in the for insert trigger in the table, as shown in the following code snippet. In this way, any INSERT statement entering the table will automatically return the IDENTITY value.

CREATE TRIGGER trProducts_Insert ON Products FOR INSERT AS   SELECT @@IDENTITY GO

The trigger is started only when INSERT is performed on the Products table. Therefore, it always returns an IDENTITY after successful INSERT. With this technology, you can always retrieve the IDENTITY value in the application in the same way.

Back to Top

Embedded view and temporary table

In some cases, you need to connect data to other data that can only be collected BY executing group by and then Executing standard query. For example, to query the information about the latest five orders, you must first know which orders are used. This can be searched using the SQL query that returns the order ID. This data is stored in a temporary table (this is a common technology) and then joined with the Products table to return the number of Products sold in these orders:

CREATE TABLE #Temp1 (OrderID INT NOT NULL, _           OrderDate DATETIME NOT NULL)INSERT INTO #Temp1 (OrderID, OrderDate)SELECT   TOP 5 o.OrderID, o.OrderDateFROM Orders o ORDER BY o.OrderDate DESCSELECT   p.ProductName, SUM(od.Quantity) AS ProductQuantityFROM   #Temp1 t   INNER JOIN [Order Details] od ON t.OrderID = od.OrderID  INNER JOIN Products p ON od.ProductID = p.ProductID GROUP BY p.ProductNameORDER BY p.ProductNameDROP TABLE #Temp1

These SQL statements create a temporary table, insert data into the table, join other data with the table, and then remove the temporary table. This results in a large number of I/O operations on this query. Therefore, you can rewrite the query and replace the temporary table with an embedded view. The nested view is just a query that can be joined to the FROM clause. Therefore, you do not need to spend a lot of I/O and disk access on temporary tables in tempdb, but you can use the embedded view to get the same result:

SELECT p.ProductName,   SUM(od.Quantity) AS ProductQuantityFROM   (  SELECT TOP 5 o.OrderID, o.OrderDate  FROM   Orders o   ORDER BY o.OrderDate DESC  ) t   INNER JOIN [Order Details] od ON t.OrderID = od.OrderID  INNER JOIN Products p ON od.ProductID = p.ProductID GROUP BY  p.ProductNameORDER BY  p.ProductName

This query is not only more efficient than the previous query, but also shorter in length. Temporary tables consume a large amount of resources. If you only need to connect data to other queries, you can try using the embedded view to save resources.

Back to Top

Avoid left join and NULL

Of course, you often need to execute left join and use NULL values. However, they do not apply to all situations. Changing the construction method of SQL queries may result in shortening a report that takes a few minutes to only a few seconds. Sometimes, you must adjust the data format in the query to adapt to the display method required by the application. Although the TABLE data type reduces resource usage, many areas can be optimized in the query. A valuable common function of SQL is LEFT JOIN. It can be used to retrieve all rows in the first table, all matched rows in the second table, and all rows in the second table that do not match the first table. For example, if you want to return each customer and their orders, use left join to display customers with and without orders.

This tool may be excessively used. Left join consumes a lot of resources because it contains data that matches NULL (nonexistent) data. In some cases, this is inevitable, but the cost may be very high. Left join consumes more resources than inner join. Therefore, if you can rewrite the query so that the query does not use any left join, this will produce a considerable return (see the figure in Figure 1 ).

Figure1:Query

One technique that accelerates query using left join involves creating a TABLE data type and inserting all rows in the first TABLE (the TABLE on the LEFT of left join, then, use the value in the second TABLE to update the TABLE data type. This technology is a two-step process, but it can save a lot of time compared with the standard left join. A good rule is to try different technologies and record the time required for each technology until you get the best query for the execution performance of your application.

When testing the query speed, it is necessary to run the query multiple times and then take an average value. Because queries (or stored procedures) may be stored in the SQL Server memory process cache, the first attempt may take a little longer, and all subsequent attempts will take a short time. In addition, you may be running other queries for the same table when running your query. When other queries lock and unlock these tables, your queries may have to wait in queue. For example, if someone is updating the data in this table when you perform a query, your query may take longer to execute during update submission.

The simplest way to avoid speed reduction when left join is to design databases around them as much as possible. For example, assume that a product may have a category or no category. If the Products table stores the category ID instead of the category of a specific product, you can store the NULL value in the field. Then you must execute left join to obtain all products and their categories. You can create a Category with a value of "No Category" to specify that the foreign key relationship does not allow NULL values. By performing the preceding operations, you can use inner join to retrieve all products and their categories. Although this seems to be a work und with excess data, it may be a very valuable technology because it can eliminate left join statements that consume more resources in SQL batch processing statements. Using this concept in databases can save a lot of processing time. Remember, even a few seconds is important for your users, because when many users are accessing the same online database application, the seconds actually mean a lot.

Back to Top

Flexible Use of cartesian products

I will give a very detailed introduction to this technique and advocate the use of cartesian products in some cases. For some reason, cross join has been condemned and developers are often warned not to use them at all. In many cases, they consume too many resources and thus cannot be used efficiently. However, like any tool in SQL, they will be very valuable if used correctly. For example, if you want to run a query that returns monthly data (even if a customer has no orders for a specific month), you can easily use the Cartesian product. The SQL statement in Figure 2 performs the preceding operations.

Although this does not seem magical, consider if you perform standard inner join from the customer to the order (these orders are grouped by month and sales are subtotal, only the month in which the customer has an order is obtained. Therefore, you will not receive a value of 0 for the month when the customer has not subscribed to any product. If you want to draw a chart for each customer to show monthly and monthly sales, you may want this chart to include months with monthly sales of 0 to visually identify these months. If you use the SQL statement in Figure 2, the data skips the month with a sales value of $0, because zero sales in the order table does not contain any rows (assuming you only store the event ).

Although the Code in Figure 3 is long, it can achieve the goal of obtaining all sales data (or even months without sales. First, it extracts the list of all last year's months and puts them into the first TABLE data type TABLE (@ tblMonths. Next, this code retrieves the names of all customer companies that have sales within this period of time, and then puts them into another TABLE data type TABLE (@ tblCus-tomers. These two tables store all the basic data necessary to create a result set, except for the actual sales volume. The first table lists all months (12 rows), and the second table lists all customers with sales during this period (81 for me ). Not every customer has purchased the product every month in the past 12 months. Therefore, executing inner join or left join will not return each customer every month. These operations only return the customer and month of the purchased product.

The Cartesian product returns all customers of all months. Descartes multiply the first table and the second table to generate a row set, which contains the result of multiplying the number of rows in the first table and the number of rows in the second table. Therefore, the Cartesian Product Returns 972 rows to the Table @ tblFinal. The final step is to update the @ tblFinal table and select the final row set using the total monthly sales of each customer within this date range.

Cross join can be used with caution if there are many resources occupied by the Cartesian product without the real Cartesian product. For example, if cross join is performed on products and categories, and most rows are filtered using the WHERE clause, DISTINCT, or group by clause, the same result is obtained using inner join, and the efficiency is much higher. If you need to return data for all possibilities (for example, when you want to fill a chart with the monthly sales date), the Cartesian product may be very helpful. However, you should not use them for other purposes, because inner join is much more efficient in most scenarios.

Back to Top

Zero missing

Here we will introduce other common technologies that can help improve SQL query efficiency. Assume that you group all sales people by region and subtotal their sales, but you only want those databases to be marked as active sales people. You can group sales people by region and use the HAVING clause to remove those who are not in the active status. You can also perform this operation in the WHERE clause. Executing this operation in the WHERE clause reduces the number of rows to be grouped, which is more efficient than executing this operation in the HAVING clause. Row-Based Filtering in the HAVING clause forces queries to group data that will be removed in the WHERE clause.

Another way to improve efficiency is to use the DISTINCT keyword to search for separate reports of data rows instead of using the group by clause. In this case, SQL statements using the DISTINCT keyword are more efficient. Use group by when you need to calculate Aggregate functions (SUM, COUNT, MAX, etc. In addition, if your query always returns a unique row, do not use the DISTINCT keyword. In this case, the DISTINCT keyword only increases the system overhead.

As you can see, there are a lot of technologies that can be used to optimize queries and implement specific business rules. The trick is to try and compare their performance. The most important thing is testing, testing, and re-testing. In the future sections of this column, I will continue to describe the concept of SQL Server in depth, including database design, good indexing practices, and SQL Server Security examples.

If you have questions and suggestions to Johnny, please email to mmdata@microsoft.com

Johnny Papa is vice president of information technology at MJM research in Raleigh, North Carolina? Professional ADO 25 RDS Programming with ASP 30 ?? (Wrox, 2000), and often give speeches in industry meetings. To contact him, please email to datapoints@lancelotweb.com

From the MSDN Magazine July 2002 issue. You can purchase this magazine at your local newsstand.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.