Performance five ways to improve SQL performance release Date: 4/1/2004 | Renew Date: 4/1/2004
Johnny Papa
Data Points Archive
Sometimes, in order for the application to run faster, the whole point of doing so is to make some minor adjustments here or there. Ah, but the key is to determine how to adjust! Sooner or later, you'll experience a situation where SQL queries in your application do not respond the way you want. It either doesn't return data, or it takes a surprisingly long time. If it lowers the speed of the report or your enterprise application, the user must wait too long and they will be dissatisfied. Just as your parents don't want to hear you explain why you're back in the middle of the night, users won't listen to you explain why the query takes so long. ("Sorry, mom, I used too many left joins.") Users want the application to respond quickly, and their reports can return profiling data in an instant. For my own part, I would be impatient if a page took more than 10 seconds to load on the WEB (well, five seconds more practical).
In order to solve these problems, it is important to find the root of the problem. So, where do we start? The root cause is usually the database design and the query that accesses it. In this month's column, I'll talk about four technologies that can be used to improve SQL Server based? The performance of the application or improve its scalability. I will carefully describe the use of the left join, the CROSS join, and the retrieval of the IDENTITY value. Remember, there is no magic solution at all. Adjusting your database and its queries takes time, analysis, and a lot of testing. These technologies have proven to be effective, but for your applications, some of these technologies may be more appropriate than others.
The contents of this page are returned from Insert Identity inline view and temporary table avoid left JOIN and NULL flexible use the Cartesian product to supplement 0 to return identity from insert
I decided to start with a lot of questions: How to retrieve the IDENTITY value after executing SQL INSERT. Often, the problem is not how to write a query that retrieves values, but where and when to retrieve them. In SQL Server, the following statement can be used to retrieve the IDENTITY value created by the most recent SQL statement running on the active database connection:
SELECT @ @IDENTITY
This SQL statement is not complicated, but one thing to keep in mind is that if this latest SQL statement is not an INSERT, or if you run this SQL for other connections that are not insert SQL, you will not get the expected value. You must run the following code to retrieve the IDENTITY that is immediately after INSERT SQL and on the same connection as follows:
INSERT into the products (ProductName) VALUES (' Chalk ') SELECT @ @IDENTITY
Running these queries against the Northwind database on a connection returns the IDENTITY value of a new product called Chalk. So, in using the Visual Basic of ADO? Application, you can run the following statement:
This code tells SQL Server not to return the row count of the query, then executes the INSERT statement, and returns the IDENTITY value that was just created for the new row. The set NOCOUNT on statement indicates that the returned recordset has a row and a column that contains the new IDENTITY value. Without this statement, an empty recordset is returned first (because the INSERT statement does not return any data), and then the second Recordset is returned, and the second recordset contains the IDENTITY value. This can be a bit confusing, especially since you never wanted the INSERT to return a recordset. This occurs because SQL Server sees the row count (that is, a row is affected) and interprets it as representing a recordset. As a result, the real data is pushed back to the second recordset. Of course, you can use the NextRecordset method in ADO to get this second recordset, but it is more convenient and more efficient if you can always return the recordset first and return only that set of records.
Although this method works, you need to add some extra code to the SQL statement. Another way to get the same result is to use the SET NOCOUNT on statement before the insert and place the SELECT @ @IDENTITY statement in the for INSERT trigger in the table, as shown in the following code fragment. This way, any INSERT statement that enters the table will automatically return the IDENTITY value.
CREATE TRIGGER Trproducts_insert on the products for Insert as SELECT @ @IDENTITY Go
Triggers are started only when inserts occur on the Products table, so it always returns an IDENTITY after a successful insert. With this technique, you can always retrieve IDENTITY values in the application in the same way.
Back to top inline view and temporary table
At some point, a query needs to join data with other data that might be collected only by executing a GROUP by and then performing a standard query. For example, if you are looking for information about the latest five orders, you first need to know which orders are. This can be retrieved using an SQL query that returns an order ID. This data is stored in temporary tables (this is a common technique) and is then joined to the Products table to return the number of items sold by these orders:
CREATE TABLE #Temp1 (OrderID INT NOT NULL, _ OrderDate DATETIME not null) INSERT into #Temp1 (OrderID, OrderDate) SELECT top 5 O.orderid, O.orderdatefrom Orders o order by o.orderdate Descselect p.productname, SUM (OD. Quantity) as Productquantityfrom #Temp1 T INNER JOIN [order Details] od on t.orderid = od. OrderID INNER JOIN products p on od. ProductID = P.productid GROUP by P.productnameorder by P.productnamedrop TABLE #Temp1
These SQL statements create a temporary table, insert the data into the table, join other data to the table, and then drop the temporary table. This causes a large number of I/O operations for this query, so you can rewrite the query to replace the temporary table with inline views. Inline view is just a query that can be joined to the FROM clause. So, instead of consuming a lot of I/O and disk access on temporary tables in tempdb, you can get the same results using inline views:
SELECT p.productname, SUM (OD. Quantity) as Productquantityfrom (SELECT Top 5 O.orderid, o.orderdate from Orders o order by o.orderdate DESC) T INNER J OIN [order Details] od on t.orderid = od. OrderID INNER JOIN products p on od. ProductID = P.productid GROUP by P.productnameorder by P.productname
This query is not only more efficient than the previous query, but also shorter in length. Temporary tables can consume a large amount of resources. If you only need to join data to other queries, you can try using inline views to save resources.
Return to the top of the page avoid left JOIN and NULL
Of course, there are many times when you need to perform a left JOIN and use NULL values. However, they do not apply to all situations. Changing the way SQL queries are built may result in a difference between a report that takes a few minutes to run and a few seconds to spend. Sometimes you have to adjust the shape of the data in your query to fit the way the application requires it to appear. Although the TABLE data type reduces the resource-intensive situation, there are many areas in the query that can be optimized. A valuable common feature of SQL is the left JOIN. It can be used to retrieve all rows in the first table, all matching rows in the second table, and all rows in the second table that do not match the first table. For example, if you want to return each customer and its order, use the left JOIN to display customers with orders and no orders.
This tool may be overused. The left JOIN consumes a lot of resources because they contain data that matches NULL (nonexistent) data. In some cases, this is unavoidable, but the price can be very high. A LEFT join consumes more resources than a INNER join, so if you can rewrite the query so that the query does not use any of the go join, you will get a very substantial return (see Figure 1).
Figure 1 Query
One technique for speeding up query speed using LEFT join involves creating a table data type, inserting all rows in the first table (the table to the left of the right join), and then updating the table data type with the values in the second table. This technique is a two-step process, but can save a lot of time compared to a standard left JOIN. A good rule is to try a variety of different technologies and record the time required for each technology until you get the best execution performance query for your application.
When testing the speed of a query, it is necessary to run the query multiple times and then take an average. Because queries (or stored procedures) may be stored in the process cache in SQL Server memory, the first attempt takes a slightly longer time, and all subsequent attempts take a shorter time. Also, when you run your query, other queries may be running against the same table. When other queries lock and unlock these tables, you may be able to cause your query to wait in line. For example, if someone is updating the data in this table while you are querying, your query may take longer to execute when the update is submitted.
The easiest way to avoid slowing down when you use a left JOIN is to design the database as much as possible around them. For example, suppose a product might have a category or no category. If the Products table stores the ID of its category and does not have a category for a particular product, you can store NULL values in the field. You must then perform a left JOIN to get all the products and their categories. You can create a category with a value of "No Category" to specify that a foreign key relationship does not allow NULL values. By doing this, you can now retrieve all products and their categories using the INNER JOIN. While this may seem like a workaround with redundant data, it can be a valuable technique because it eliminates the more resource-consuming left JOIN in SQL batch statements. Using this concept all in a database can save you a lot of processing time. Keep in mind that even a few seconds is important for your users, because when you have many users who are accessing the same online database application, these seconds are actually significant.
Return to the top of the page flexible use of the Cartesian product
For this tip, I'll make a very detailed introduction and promote the use of the Cartesian product in some cases. For some reason, the Cartesian product (CROSS JOIN) has been a lot of condemnation, and developers are often warned not to use them at all. In many cases, they consume too much resources to be used efficiently. But like any tool in SQL, they can be valuable if used correctly. For example, you can easily use the Cartesian product if you want to run a query that returns monthly data, even if the customer has no order to return for a specific month. The SQL in Figure 2 performs the above actions.
While this may seem like nothing magical, consider that if you make a standard INNER JOIN from the customer to the order (grouped by month and subtotal sales), you will only get the month that the customer has the order. Therefore, you will not receive a value of 0 for the month in which the customer has not ordered any products. If you want to draw a diagram for each customer to display the monthly and monthly sales, you might want the figure to include months with monthly sales of 0 to visually identify the months. If you use the SQL in Figure 2, the data skips the month with sales of 0 dollars, because no rows are included in the order table for 0 sales (assuming you store only the events that occur).
The code in Figure 3 is long, but it can achieve the goal of getting all sales data, even months without sales. First, it extracts the list of all the months of the last year, and then puts them in the first table data type sheet (@tblMonths). Next, the code gets a list of all the customer companies that have sales in that time period, and then places them in a table of another table data type (@tblCus-tomers). These tables store all the basic data necessary to create the result set, except for the actual sales quantity. All the months (12 rows) are listed in the first table, and the second table lists all the customers with sales in this time period (for me, 81). Not every customer has purchased a product every month for the last 12 months, so performing a INNER join or a LEFT join does not return each customer per month. These actions will only return customers and months that purchased the product.
The Cartesian product can return all customers for all months. The Cartesian product basically multiplies the first table with the second table, generating a rowset that contains the result of the number of rows in the first table multiplied by the number of rows in the second table. Therefore, the Cartesian product returns 972 rows to the table @tblFinal. The final step is to update the @tblFinal table with monthly sales totals for each customer in this date range, and select the final rowset.
If the Cartesian product takes up a lot of resources and does not require a true Cartesian product, you can use the CROSS JOIN cautiously. For example, if you perform a CROSS join on a product or category, and then filter most rows with a WHERE clause, DISTINCT, or GROUP by, then using the INNER join results in the same result and is much more efficient. The Cartesian product can be very helpful if you need to return data for all possibilities (for example, when you want to populate a chart with a monthly sales date). However, you should not use them for other purposes, because INNER joins are much more efficient in most scenarios.
Back to top 0
Here are some other common techniques that can help improve the efficiency of SQL queries. Suppose you group all salespeople by region and subtotal their sales, but you only want salespeople who are marked as active in those databases. You can group salespeople by region and use the HAVING clause to eliminate those who are not active, or you can do this in the WHERE clause. Performing this operation in the WHERE clause reduces the number of rows that need to be grouped, so it is more efficient than performing this operation in the HAVING clause. A filter that is based on a row in the HAVING clause forces the query to group the data that is removed in the WHERE clause.
Another technique for improving efficiency is to use the DISTINCT keyword to find separate reports for data rows instead of using the GROUP by clause. In this case, SQL is more efficient with the DISTINCT keyword. Use GROUP by if you need to compute aggregate functions (SUM, COUNT, MAX, and so on). Also, if your query always returns a unique row for itself, do not use the DISTINCT keyword. In this case, the DISTINCT keyword only increases the overhead of the system.
As you can see, there are a lot of techniques available for optimizing queries and implementing specific business rules, and the trick is to try and compare their performance. The most important thing is to test, test, and then test. In future installments of this column, I will continue to delve into the concepts of SQL Server, including database design, good indexing practices, and SQL Server security paradigm.
If you have questions and suggestions to Johnny, please send an email to mmdata@microsoft.com
Johnny Papa is vice president of information technology at North Carolina State Raleigh's MJM research firm. Professional ADO RDS Programming with ASP 30?? (Wrox, 2000), and often make speeches in industry meetings. To contact him, please send an email to datapoints@lancelotweb.com
Excerpt from the MSDN Magazine July 2002 issue. You can buy this magazine at your local newsstand, but it's best to subscribe now
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.