Introduction:
In the course of the interview, the interviewer always asked: Have you ever done SQL performance optimization? In response to this, my answer is no. There is no mistake at one time, two times is not, but if it is many times? Today, make a decision to summarize the relevant knowledge of SQL performance optimization, so that in the near future, my answer is not "no", can always say something. It's a progress. When it comes to performance optimization, I feel the need to understand the order in which SQL statements are executed, because it helps to optimize more or less.
SQL statement Execution Order:
The biggest difference between SQL statements and other related programming languages should be the order of execution. For most programming languages, execution is performed sequentially, but for SQL statements, although select is the first to appear, it is almost always the last to execute, often starting with the FROM clause. Each step produces a virtual table, which is not available to the caller, but only to the next step, and only the final query result table can be used by the caller. Skips this step when there are no steps to appear. The following code:
(8) SELECT (9) DISTINCT ( One) <top Num> <SelectList>(1) from [left_table] (3) <join_type> Join <right_table>(2) on <join_condition>(4) WHERE <where_condition>(5) GROUP by <group_by_list>(6) with <cube | Rollup>(7) Having (Ten) ORDER by <order_by_list>
Introduction to the Logical query processing phase:
1) From: Performs a Cartesian product (Cartesian product) (cross join) on the first two tables in the FROM clause, generating the virtual table VT1
2) On: Apply the on filter to the VT1. Only those lines that make <join_condition> true are inserted into the VT2
3) Outer (join): If outer join is specified (relative to cross join or (INNER join), the reserved table (preserved table: Left outer join marks the left table as a reserved table, right outer join marks the right table as a reserved table, A full outer join that marks two tables as a reserved table) does not find a matching row to be added to VT2 as an outer row, generating VT3. If the FROM clause contains more than two tables, repeat steps 1 through 3 for the result table and the next table that were generated for the previous join until all the tables have been processed.
4) Where: apply where filter to VT3. Only rows that make <where_condition> true are inserted into the VT4.
5) Group BY: Groups the rows in VT4 by the list of columns in the GROUP BY clause, generating VT5.
6) Cube|roolup: Insert the Super Group (suppergroups) into VT5, generate VT6.
7) Having: Apply a having filter to VT6. Only groups that make
8) Select: Processes the select list, generating VT8.
9) Distinct: Remove duplicate rows from VT8, resulting in VT9.
ORDER BY: The rows in VT9 are sorted by the list of columns in the ORDER BY clause, and the cursor is generated (VC10)
Top: Selects the specified number or scale of rows from the beginning of the VC10, generates the table VT11, and returns the caller.
Note:
Step 10, sort the rows returned by the column list in the ORDER BY clause, and return the cursor VC10. This step is the first and only step you can use the column aliases in the select list. This step differs from the other step in that it does not return a valid table, but instead returns a cursor. SQL is based on the set theory. The collection does not pre-order its rows, it is only a logical collection of members, and the order of the members is irrelevant. A query that sorts a table can return an object that contains rows organized in a specific physical order. ANSI calls this object a cursor. Understanding this step is the basis for a proper understanding of SQL.
Because this step does not return a table (instead of returning a cursor), a query that uses the ORDER BY clause cannot be used as a table expression. Table expressions include: views, inline table-valued functions, subqueries, derived tables, and common expressions. Its result must be returned to the client application that expects to get the physical record.
In SQL, a query with an ORDER BY clause is not allowed in a table expression, but there is an exception in T-SQL (apply top option). So remember, don't assume any particular order for the rows in the table. In other words, do not specify an ORDER BY clause unless you are sure you want to order rows. Sorting is a cost, and SQL Server needs to perform an ordered index scan or use the sort runner.
when SQL statements are executed, the table names in the FROM clause are processed in a right-to-left order, and the last table in the FROM clause is the first to be processed, so if you have more than one table in the FROM clause, select the table with the lowest number of record bars as the underlying table . In a way, it will greatly improve its performance. If there are more than 3 tables, select the crosstab table as the underlying table. This is very important for performance tuning.
Execution Plan:
After the execution sequence is finished, the execution plan is discussed:
An execution plan is a query scheme that the database makes based on the statistics of SQL statements and related tables, which are generated automatically by the query optimizer, such as an SQL statement that searches for 1 records from a 100,000-record table, and the query optimizer chooses the "Index lookup" method. If the table is archived and there are currently only 5,000 records left, the query optimizer will change the scenario with a "full table scan" approach.
As can be seen, the execution plan is not fixed, it is "personalized". It is important to produce a correct "execution plan" with two points:
(1) Does the SQL statement clearly tell the query optimizer what it wants to do?
(2) is the database statistics obtained by the query optimizer up-to-date and correct?
Optimized Inspection tools:
Basic knowledge is complete, start performance optimization, but how can we know that the system of those SQL statements should be performance optimization, whether the statement should be system optimization, view relevant data, for SQL Server, find SQL Server database corresponding to a Profiler, the tool can be used to find out what kind of operational behavior has lowered its performance for a database table.
Open the System main Menu--sqlserver several---performance tools--->>sql server Profiler;
Then the file--new trace--Displays the Tracking Properties window;
First of all, the select% is a screening and monitoring textdata. That percent is a wildcard, and what he means is to filter the statements of the select openings. Of course you can define it yourself, like update%,delete% ...
Take the line that excludes the value, and then OK, run. Then run a select in the database. You'll find him detected.
1. Find the longest-lasting query
In general, a query statement that has the longest query time is the one that most affects performance. It not only occupies a large amount of time in the database engine, but also wastes system resources and affects the interaction speed of the database application system. When the data is optimized with the application system, we first find him, optimize it, and when the tracking is created, tick the tsql-sql:batchcompleted. With stored procedures-rpc:completed. This makes it possible to find the maximum time to query and then analyze and optimize it.
Select Textdata,duration,cpu from < tracked table >where eventclass=12-equals 12 for BatchCompleted event and cpu< (0.4*duration) --If the CPU time is less than 40% of the time the SQL statement was executed, it indicates that the statement waits too long
2. Queries that most Occupy system resources
Is the amount of CPU time, and the number of read/write Io. Recommended events include Connect, Disconnect, ExistingConnection, sql:batchcompleted, rpc:completed, and columns that contain WRITES,READS,CPU.
3. Detect Deadlocks
In a database with a large number of accesses and concurrency, if the design is slightly unreasonable, it is possible to create deadlocks that can have an impact on system performance. Events include: Rpc:starting, Sql:batchstarting, Lock:deadlock (Deadlock event), lock:deadlockchaining (sequence of events that are deadlocked).
Consult SQL Server performance detection and optimization tools using detailed
Database Engine Tuning Advisor
With SQL Server Profiler, there is a Database Engine Tuning Advisor, which is also a tool for performance optimization, and can take time to understand. Let me know and add.
Common experience with SQL performance optimization:
Here's a summary of what each of the companies on the web think SQL optimization should do:
1, fuzzy query like.
Use like for fuzzy query should pay special attention, this is very basic, basically we all know. Oh
Select*fromwhere username like '%yue% '
Keyword%yue%, because Yue front used "%", so the query must go full table scan, unless necessary, otherwise do not add% before the keyword.
2. Where Condition query
Avoid using In,not in,having as much as possible, and you can use exist and not exist instead of in and not. Do not declare numbers in character format, you want to declare character values in numeric format.
3. When there are more than one table in the FROM clause mentioned earlier for the associated query
When you include more than one table in the FROM clause, selecting the table with the fewest number of record bars as the base table will greatly improve its performance in some way. If there are more than 3 tables, select the crosstab as the underlying table
4. Select * Query
Try not to use
Select from TableName
Instead, they are:
Select from TableName
5. Sort operations
Avoid the use of resource-intensive operations, SQL statements with Distinct,union,minus,intersect,order by will start the SQL engine execution, resource-intensive sorting (sort) function. Distinct requires a sort operation, while the others need to perform at least two sorting.
6. Index Table operation
For this, the individual has not figured out, first of all, for the index is not clear, so performance optimization is far from. Anyway, many Daniel are operating the index table, need special attention. Later to understand, add it.
...
7. Unified Specification SQL statement
Writing a canonical SQL statement is the most important point, both for the system and for the individual.
Non-standard are:
Very complex SQL statement, for the writer himself is dizzy.
Writing at will is a small problem for the system.
Certainly also, is usually more attention on the OK.
Reference:
SQL Server performance detection and optimization tools use detailedexpert detailed SQL performance Optimization 10 experienceOptimizing SQL queries: How to write high-performance SQL statements
SQL Performance Tuning