Five Ways to Rev up Your SQL Performance

Source: Internet
Author: User
Tags sql using
Document directory
  • Returning an IDENTITY From an INSERT
  • Inline Views Versus Temp Tables
  • Avoid LEFT JOINs and NULLs
  • Use Cartesian Products Wisely
  • Odds and Ends
Sometimes all it takes is a little tweak here or there to make your application run much faster. Ah, but the key is figuring out how to tweak it! Sooner or later you'll face a situation where a SQL query in your application isn't responding the way you intended. either it doesn' t return the data you want or it takes entirely too long to be reasonable. if it slows down a report or your enterprise application, users won't be pleased if they have to wait inordinate amounts of time. and just like your parents didn't want to hear why you were coming in past curfew, users don't want to hear why your query is taking so long. ("Sorry, Mom, I used too LEFT JOINs. ") Users want applications to respond quickly and their reports to return analytical data in a flash. I myself get impatient when I surf the Web and a page takes more than ten seconds to load (OK, more like five seconds ).
To resolve these issues, it is important to get to the root of the problem. So where do you start? The root cause is usually in the database design and the queries that access it. in this month's column I'll demonstrate four techniques that can be used to either improve your SQL Server-based application's performance or improve its scalability. i'll examine the use of LEFT JOINs, CROSS JOINs, and retrieving an IDENTITY value. keep in mind that there is no magic solution. tuning your database and its queries takes time, analysis, and a lot of testing. while the techniques here are proven, some may work better than others in your application.
Returning an IDENTITY From an INSERT

I figured I wocould start with something I get a lot of questions about: how to retrieve an IDENTITY value after inserting Ming a SQL INSERT. often, the problem is not how to write the query to retrieve the value, but rather where and when to do it. in SQL Server, the statement to retrieve the IDENTITY value created by the most recent SQL statement run on the active database connection is as follows:

SELECT @@IDENTITY

While this SQL is far from daunting, it is important to keep in mind that if the most recent SQL statement was not an INSERT or you run this SQL against a different connection than the insert SQL, you will not get back the value you provided CT. you must run this code to retrieve the IDENTITY immediately following the insert SQL and on the same connection, like this:

INSERT INTO Products (ProductName) VALUES ('Chalk')SELECT @@IDENTITY

Running these queries on a single connection against the Northwind database will return to you the IDENTITY value for the new product called Chalk. so in your Visual Basic application using ADO, you cocould run the following statement:

Set oRs = oCn.Execute("SET NOCOUNT ON;INSERT INTO Products _(ProductName) VALUES ('ChalkSELECT @@IDENTITY")lProductID = oRs(0)

This code tells SQL Server not to return a row count for the query, then executes the INSERT statement and returns the IDENTITY value just created for the new row. the set nocount on statement means the Recordset that is returned has one row and one column that contains the new IDENTITY value. without this statement, an empty Recordset is returned (because the INSERT statement returns no data) and then a second Recordset is returned, which contains the IDENTITY value. this can be confusing, especially since you never intended the INSERT to return a Recordset. this situation occurs because SQL Server sees the row count (that is, one row affected) and interprets it as representing a Recordset. so the true data is pushed back into a second Recordset. while you can get to this second Recordset using the NextRecordset method in ADO, it is much easier (and more efficient) if you can always count on the Recordset being the first and only one returned.
While this technique gets the job done, it does require extra code in the SQL statement. another way of getting the same result is to use the set nocount on statement preceding the INSERT and to put the SELECT @ IDENTITY statement in a for insert trigger on the table, as shown in the following code snippet. this way, any INSERT statement into that table will automatically return the IDENTITY value.

CREATE TRIGGER trProducts_Insert ON Products FOR INSERT ASSELECT @@IDENTITYGO

The trigger only fires when an INSERT occurs on the Products table, so it always will return an IDENTITY after a successful INSERT. using this technique, you can consistently retrieve IDENTITY values in the same manner authentication SS your application.
Inline Views Versus Temp Tables

Queries sometimes need to join data to other data that may only be gathered by grouping a group by and then a standard query. for example, if you want to return the information about the five most recently placed orders, you wowould first need to know which orders they are. this can be retrieved by using a SQL query that returns the orders 'IDs. this data cocould be stored in a temporary table, a common technique, and then joined to the Product table to return the quantity of products sold on those orders:

CREATE TABLE #Temp1 (OrderID INT NOT NULL, _OrderDate DATETIME NOT NULL)INSERT INTO #Temp1 (OrderID, OrderDate)SELECT     TOP 5 o.OrderID, o.OrderDateFROM Orders o ORDER BY o.OrderDate DESCSELECT     p.ProductName, SUM(od.Quantity) AS ProductQuantityFROM     #Temp1 tINNER JOIN [Order Details] od ON t.OrderID = od.OrderIDINNER JOIN Products p ON od.ProductID = p.ProductIDGROUP BY p.ProductNameORDER BY p.ProductNameDROP TABLE #Temp1

This batch of SQL creates a temporary table, inserts the data into it, joins other data to it, and drops the temporary table. this is a lot of I/O for this query, which cocould be rewritten to use an inline view instead of a temporary table. an inline view is simply a query that can be joined to in the FROM clause. so instead of spending a lot of I/O and disk access in tempdb on a temporary table, you cocould instead use an inline view to get the same result:

SELECT p.ProductName,SUM(od.Quantity) AS ProductQuantityFROM     (SELECT TOP 5 o.OrderID, o.OrderDateFROM     Orders oORDER BY o.OrderDate DESC) tINNER JOIN [Order Details] od ON t.OrderID = od.OrderIDINNER JOIN Products p ON od.ProductID = p.ProductIDGROUP BYp.ProductNameORDER BYp.ProductName

This query is not only more efficient than the previous one, it's shorter. temporary tables consume a lot of resources. if you only need the data to join to other queries, you might want to try using an inline view to conserve resources.
Avoid LEFT JOINs and NULLs

There are, of course, times when you need to perform a left join and use NULL values. but they are not a solution for all occasions. changing the way you structure your SQL queries can mean the difference between a report that takes minutes to run and one that takes only seconds. sometimes you have to morph the data in a query to look the way your application wants it to look. while the TABLE datatype reduces resource glutony, there are still plenty of areas in a query that can be optimized. one valuable, commonly used feature of SQL is the LEFT JOIN. it can be used to retrieve all of the rows from a first table and all matching rows from a second table, plus all rows from the second table that do not match the first one. for example, if you wanted to return every Customer and their orders, a left join wocould show the MERs mers who did and did not have orders.
This tool can be overused. LEFT JOINs are costly since they involve matching data against NULL (nonexistent) data. in some cases this is unavoidable, but the cost can be high. a left join is more costly than an inner join, so if you cocould rewrite a query so it doesn't use a left join, it cocould pay huge dividends (see the disince inFigure 1).


Figure 1 Query

One technique to speed up a query that uses a left join involves creating a TABLE datatype and inserting all of the rows from the first table (the one on the left-hand side of the LEFT JOIN ), then updating the TABLE datatype with the values from the second table. this technique is a two-step process, but cocould save a lot of time compared to a standard left join. A good rule is to try out different techniques and time each of them until you get the best known Ming query for your application.
When you are testing your query's speed, it's important to run it several times and take an average. your query (or stored procedure) cocould be stored in the procedure cache in SQL Server's memory and thus wocould appear to take longer the first time and shorter on all subsequent tries. in addition, other queries cocould be running against the same tables while your query runs. this cocould cause your query to stand in line while other queries lock and unlock tables. for example, if you are querying while someone is updating data in that table, your query may take longer to execute while the update commits.
One of the easiest ways to avoid slowdowns with LEFT JOINs is to design the database around them as much as possible. for example, let's assume that a product may or may not have a category. if the product table stores the ID of its category and there was no category for a participant product, you cocould store a NULL value in the field. then you wowould have to perform a left join to get all of the products and their categories. you cocould create a category with the value of "No Category" and thus specify the foreign key relationship to disallow NULL values. by doing this, you can now use an inner join to retrieve all products and their categories. while this may seem like a workaround with extra data, this can be a valuable technique as it can eliminate costly LEFT JOINs in SQL batches. using this concept extends ss the board in a database can save you lots of processing time. remember, even a few seconds means a lot to your users, and those seconds really add up when you have used users accessing an online database application.
Use Cartesian Products Wisely

For this tip, I will go against the grain and advocate the use of Cartesian products in certain situations. for some reason, Cartesian products (cross joins) got a bad rap and developers are often cautioned not to use them at all. in our cases, they are too costly to use them tively. but like any tool in SQL, they can be valuable if used properly. for example, if you want to run a query that will return data for every month, even on customers that had no orders that particle month, you could use a Cartesian product quite handily. the SQL in Figure 2 does just that.
While this may not seem like magic, consider that if you did a standard inner join from MERs to Orders, grouped by the month and summed the sales, you wocould only get the months where the customer had an order. thus, you wocould not get back a 0 value for the months in which the customer didn't order any products. if you wanted to plot a graph per customer showing every month and its sales, you wocould want the graph to include 0 month sales to identify those months instead ally. if you use the SQL in Figure 2, the data skips over the months that had $0 in sales because there are no rows in the Orders table for nonsales (it is assumed that you do not store what did not occur ).
The code in Figure 3 is longer, but can achieve the same goal of getting all the sales data, even for months without sales. first, it grabs a list of all of the months in the past year and puts them in the first TABLE datatype table (@ tblMonths ). next, the code gets a list of all MERs 'Company names who had sales during that time period and puts them in another TABLE datatype table (@ tblCus-tomers ). these two tables store all of the basic data required to create the resultset counter t the actual sales numbers.
All of the months are listed in the first table (12 rows) and all of the customers who had sales in that time frame are listed in the second table (81 for me ). not every customer purchased a product in each of the past 12 months, so please Ming an INNER or left join won't return every customer for every month. these operations will only return the customers and the months when they did purchase something.
A Cartesian product can return all MERs for all months. A Cartesian product basically multiplies the first table by the second table and results in a rowset that contains the number of rows in the first table times the number of rows in the second table. thus, the Cartesian product returns 972 rows into the table @ tblFinal. the last steps are to update the table @ tblFinal with the monthly sales totals for each customer during the date range and to select the final rowset.
Use CROSS JOINs with caution if you do not need a true Cartesian product because they can be very resource intensive. for example, if you do a cross join on products and categories and then use a WHERE clause, DISTINCT or group by to filter out most of the rows, you coshould have gotten to the same result in a much more efficient manner by using an inner join. cartesian products can be very useful when you need the data returned for all possibilities, as in the case when you want to load a graph with monthly sales dates. but you shoshould not use them for other purposes as INNER JOINs are much more efficient in most scenarios.
Odds and Ends

Here are a few other common techniques that can help improve the efficiency of your SQL querying. let's assume you are going to group all of your salespeople by region and sum their sales, but you only want salespeople who were marked active in your database. you cocould group the salespeople by region and use a HAVING clause to eliminate the salespersons who are not active, or you cocould do this in the WHERE clause. doing this in the WHERE clause when CES the number of rows that need to be grouped, so it is more efficient than doing it in the HAVING clause. filtering row-based criteria in the HAVING clause forces the query to group data that cocould have been eliminated in the WHERE clause.
Another efficiency trick is to use the DISTINCT keyword to find a distinct list of data rows instead of using the group by clause. in this case, the SQL using the DISTINCT keyword will be more efficient. reserve use of the group by for occasions when you need to calculate an aggregate function (SUM, COUNT, MAX, and so on ). also, avoid using the DISTINCT keyword if your query will always return a unique row on its own. in that case, the DISTINCT keyword will only add overhead.
You 've seen that numerous techniques can be employed to optimize queries and implement specific business rules; the trick is to try a few and compare their performance. most important is to test, test, and test again. in future installments of this column, I'll continue to e SQL Server concepts including database design, good indexing practices, and SQL Server security paradigms.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.