SQL SQL Server (verbose) SQL statement optimization _mssql

Source: Internet
Author: User
Tags commit numeric joins mathematical functions sql server query mssql rollback server memory

MS SQL Server Query optimization method
Deferrable keyword in sql
There are many reasons for the slow speed of queries, which are common
Sql deferrable
1, no indexes, or no indexes (this is the most common problem with query slowness, is the flaw in programming)
2, I/o throughput is small, creating a bottleneck effect.
3, no computed columns are created causing the query to be not optimized.
4, Low memory
5, network speed is slow
6, query out the amount of data is too large (you can use multiple queries, other methods to reduce the amount of data)
7, locks, or deadlocks (which is also the most common problem with query slowness, is the flaw in programming)
8, sp_lock,sp_who, active user view, because of read-write competitive resources.
9, and the unnecessary rows and columns are returned
Ten, query statements are not good, no optimization

You can refine your query by using the following methods

1, putting data, logs, and indexes into different I/o device, increase read speed, before you can put tempdb on RAID0, SQL2000 not supported. The larger the amount of data (size), the higher I/o more important.
2, longitudinal, transverse split table, reduce the size of the table (Sp_spaceuse)
3, upgrade hardware
4, the index is indexed, the index is optimized, the access mode is improved, and the data quantity of the result set is limited according to the query condition. Note that the fill factor should be appropriate (preferably with the default value of 0). The index should be as small as possible, with a small number of bytes of Lie Jian index Good (reference to the creation of the index), not to a limited number of characters Jianjian a single index such as Gender field
5, improve network speed;
6, expanding server memory, Windows 2000 and SQL Server 2000 can support 4-8G of memory. Configure virtual Memory: The virtual memory size should be configured based on the services that are running concurrently on the computer. Run Microsoft SQL Server? Watts, you can consider setting the virtual memory size to the physical memory installed on your computer .1.5times. If you installed the Full-text Search feature and intend to run the Microsoft search service to perform full-text indexing and querying, consider: Configure the virtual memory size to be at least the physical memory installed on your computer3times. The SQL ServerMaxServer memory configuration option is configured as physical memory1.5Times (half of the virtual memory size setting).
7, increase the number of server CPUs, but it must be understood that parallel processing of serial processing requires resources such as memory. The use of parallel or serial stroke is the automatic evaluation of MSSQL selection. A single task can be run on a processor by breaking it into multiple tasks. For example, delays in sorting, linking, scanning, and group by words, SQL Server determines the optimal level of parallelism based on the load of the system, and complex queries that consume a large number of CPUs are best suited for parallel processing. But update operation update,INSERT, delete cannot be processed in parallel.
8, if you are using like to query, the simple use of index is not good, but full-text indexing, space consumption. like 'a%'Using Indexes like '%a'do not use indexes with like '%a%'Query time is proportional to the total length of the field value, so you cannot use the char type, but varchar. A full-text index that is long for a field's value.
9, DB server and Application server separation; OLTP and OLAP separation
Ten, distributed partitioned views can be used to implement a federation of database servers. A consortium is a set of separately managed servers, but they work together to share the processing load of the system. This mechanism for creating a federation of database servers through partitioned data can expand a set of servers to support the processing needs of large, multi-tiered Web sites. For more information, see Designing a federated database server. (Refer to the SQL Help file'Partitioned View')

A, before you implement a partitioned view, you must first partition the table horizontally
B, after the member table is created, a distributed partitioned view is defined on each member server, and each view has the same name. In this way, queries that reference the distributed partitioned view name can run on any one member server. The system operates like a copy of the original table on each member server, but in fact there is only one member table and one distributed partitioned view per server. The location of the data is transparent to the application.

One, rebuilding the indexDBCCREINDEX,DBCCIndexdefrag, shrinking data and logsDBCCSHRINKDB,DBCCshrinkfile. Set up automatic shrink logging. For large databases do not set up automatic database growth, it can degrade server performance. In T-There is a lot of emphasis on the way SQL is written, and here's a list of common points: first, the process that the DBMS processes the query plan is this:

1, lexical and grammatical checking of query statements
2, the query optimizer that submits the statement to the DBMS
3Optimization of algebra optimization and access path optimization
4, build query plan from precompiled module
5and then submit it to the system for execution at the right time
6and finally returns the results to the user second, look at the structure of SQL Server data storage: A page size of 8K (8060) Byte, 8 pages for a panel, stored according to B-Tree.

ofthe difference between commit and rollbackRollback: Roll back all the things. Commit: Submitting the current thing. There is no need to write things in dynamic SQL, if you want to write in the outside like:begin Tran exec(@s) Committrans or write dynamic SQL as a function or stored procedure.

of, in the query SELECT statement, use the WHERE clause to limit the number of rows returned, avoid table scans, and if you return unnecessary data, waste the server's I/o resources, which add to the burden of the network and degrade performance. If the table is large, locks the table during table scans, and prevents other joins from accessing the table, with serious consequences.

-, SQL Comment Declaration has no effect on execution

-, as far as possible without using the cursor, it occupies a large amount of resources. If you need row- by-row to implement, as far as possible using non-cursor technology, such as: In the client loop, with a temporary table, table variables, with subqueries, with case statements and so on. Cursors can be categorized according to the extraction options it supports: Only the rows must be fetched in the order from the first line to the last line. FETCH NEXTis the only allowed extraction operation and is the default method. Scrollable can randomly extract any row anywhere in the cursor. The technique of cursors becomes powerful under SQL2000, and his purpose is to support loops.

There are four concurrent options

READ_ONLY: The update is not allowed to navigate through the cursor (Update, and there are no locks in the rows that make up the result set.

Optimistic with ValueS: Optimistic concurrency control is a standard part of the theory of transaction control. Optimistic concurrency control is used in situations where there is only a small opportunity for a second user to update a row in the interval between opening the cursor and updating the row. When a cursor is opened with this option, there is no lock to control the rows in it, which helps to maximize its processing power. If the user attempts to modify a row, the current value of this row is compared to the value obtained when the row was last fetched. If any value changes, the server knows that someone else has updated the row and returns an error. If the value is the same, the server performs the modification. Select this concurrency option optimistic withRow VERSIONING: This optimistic concurrency control option is based on row version control. With row versioning, the table must have a version identifier that the server can use to determine whether the row has changed since it was read into the cursor.
In SQL Server, this performance is performed by thetimestampdata type, which is a binary number that represents the relative order in which changes are made in the database. Each database has a global current timestamp value:@ @DBTS. Every time you change it in any way withtimestampcolumn, SQL Server stores the current in the timestamp column first in the@ @DBTSvalue, and then increase@ @DBTSthe value. If a table has atimestampcolumn, the timestamp is recorded at the row level. The server can compare the current timestamp value of a row with the timestamp value stored at the last fetch to determine whether the row has been updated. The server does not have to compare the values of all columns, just comparetimestampcolumn can be. If the application is nottimestampColumn tables require optimistic concurrency based on row versioning, Reise optimistic concurrency control based on numeric values.
SCROLL LOCKS This option to implement pessimistic concurrency control. In pessimistic concurrency control, the application attempts to lock the database row when the row of the database is read into the cursor result set. When a server cursor is used, an update lock is placed on the row when it is read into the cursor. If a cursor is opened within a transaction, the transaction update lock remains until the transaction is committed or rolled back, and the cursor lock is dropped when the next row is fetched. If you open a cursor outside of a transaction, the lock is discarded when the next row is fetched. Therefore, whenever a user needs full pessimistic concurrency control, the cursor should open within the transaction. Updating a lock prevents any other task from obtaining an update or exclusive lock, preventing other tasks from updating the row.
However, the update lock does not block shared locks, so it does not prevent other tasks from reading rows unless the second task also requires a read with an update lock. The scroll lock is based on a cursor definedSELECTstatement, these cursor concurrency options can generate a scroll lock. The scroll lock is fetched on each line during extraction and is persisted to the next fetch or the cursor is closed, whichever occurs first. The next time the fetch is fetched, the server acquires a scroll lock for the row in the new fetch and releases the scroll lock from the last fetch of the rows. The scroll LOCK is independent of the transaction lock and can be persisted after a commit or rollback operation. If the option to close the cursor at commit is off, theCOMMITThe statement does not close any open cursors, and the scroll locks are persisted to the commit to maintain quarantine of the extracted data. The type of the obtained scroll lock depends on the cursor concurrency options and cursorsSELECTthe lock hint in the statement.
Lock prompt read-only optimistic value optimistic row version control lock silent unlocked unlocked unlocked not locked update NOLOCK unlocked unlocked Unlocked UnlockedHOLDLOCKshared shared share update UPDLOCK error update update update TABLOCKX error unlocked unlocked update other unlocked unlocked unlocked Update*Specifies that the NOLOCK hint will make the table specified for that hint read-only in Gianpene.

-, using Profiler to track the query, get the time required by the query to find out the problem of SQL; optimizing indexes with the index optimizer

-, pay attention to union and Union Allthe difference. UNIONAll Right

A, pay attention to using distinct, do not use when not necessary, it will make the query slow as the union. Duplicate records are no problem in the query.

-, do not return unwanted rows and columns when querying

-, with sp_configure'query governor cost limit'or set Query_governor_cost_limit to limit the resources that the query consumes. When an estimate query consumes more resources than the limit, the server automatically cancels the query and kills it before the query. SETLocktime Time to set locks


-, with select Top - / Ten Percentto limit the number of rows returned by the user or set rowcount to restrict the operation's rows

-, before SQL2000, generally do not use the following words: " is NULL", " <> ", "!=", "!> ", "! <", " not", " not EXISTS", " not in", " not like", and " like '%500'"Because they don't go the index is all table scans. Also, do not add a function to the column name in the WHERE clause, such as convert,substring, and if you must use a function, create a computed column and then create an index instead. You can also work around it:WHERE SUBSTRING(FirstName,1,1) = 'm'change to where FirstName like 'm%'(Index Scan), be sure to separate the function from the column name. And the index cannot be built too much and too large. notin will scan the table multiple times, using exists, not EXISTS, in , Left OUTER JOINInstead , especially the left connection, and the exists is faster than in, and the slowest is not. If the value of the column is empty, the previous index does not work, and the 2000 optimizer is now able to handle it. The same isNULL, " not", " not EXISTS", " not in"To optimize her, and"<>"and so on or not optimized, the index is not used."

A, use query Analyzer to view the query plan for the SQL statement and evaluate whether the analysis is optimized for SQL. The General%the code takes up%the resources that we optimize are the focus of these slow places.

-, if you use in or or, and so on, find that the query is not indexed, specify the index using the display declaration:SELECT * fromPersonmember (INDEX =ix_title)WHEREProcessID in(' Male ', ' female ')

-, the results of the query need to be calculated in advance in the table, and then select the query. This is the most important means before SQL7.0. For example, hospital hospitalization costs are calculated.

Num,MIN() andMAX() can be used to the appropriate index

-, the database has a principle is that the code from the data closer to the better, so priority to choose Default, followed by Rules,triggers,Constraint(constraints such as the external health of the main health checkunique ..., the maximum length of the data type, etc. are constraints),ProcedureThis is not only a small maintenance effort, the quality of the program is high, and the speed of execution is fast.

-, if you want to insert a large binary value into an image column, use a stored procedure and never insert with inline inserts (I don't know if Java is). Because the application first converts the binary value to a string (twice times the size), the server is converted to a binary value after the character. Stored procedures do not have these actions: Method:Create procedureP_insert as Insert into Table(fimage)Values (@image, this stored procedure is invoked in the foreground to pass in the binary parameter, which improves the processing speed significantly.

-, between are faster than in-speed at some point, and between can quickly find a range based on the index. The query optimizer shows differences. Select * fromChineseresumewheretitle in ('male','female') Select * fromChineseresumewhere between 'male' and 'female'is the same. Because in is more than once, it is sometimes slower.

-, when it is necessary to create indexes on global or local temporary tables, it can sometimes improve speed, but not necessarily, because indexes also consume a lot of resources. He created the same as the actual table.

to, don't build things that don't work. For example, when generating reports, waste resources. Use it only when it is necessary to use things.

-, or the words can be decomposed into multiple queries, and through the Union to connect multiple queries. Their speed is only related to whether or not the index is used, and if the query needs to use a federated index, it is more efficient to execute with UNION all. Multiple or phrases are not indexed, converted to union, and then try to match the index. A key question is whether to use the index.

-, minimize the use of views, and it's inefficient. The view operation is slower than the direct table operation, and you can replace her with stored procedure. Specifically, do not use view nesting, nested views increase the difficulty of finding raw materials. We look at the nature of the view: It is an optimized SQL stored on the server that has generated a query plan. When retrieving data from a single table, do not use a view that points to multiple tables, retrieve it directly from the table, or simply include a view that contains the table, otherwise it adds unnecessary overhead and interferes with the query. MSSQL increases the functionality of the view index in order to expedite query for the view.

-, do not use distinct and order when necessary by, these actions can be performed on the client side. They add extra overhead. This is the same reason as Union and union all. SELECT Top -Ad.companyname,comid,position,ad.referenceid,worklocation,Convert(varchar(Ten), Ad.postdate, the) aspostdate1,workyear,degreedescription fromJobcn_query.dbo.COMPANYAD_query ADwhereReferenceid in('JCNAD00329667','JCNAD132168','JCNAD00337748','JCNAD00338345','JCNAD00333138','JCNAD00303570', 'JCNAD00303569','JCNAD00303568','JCNAD00306698','JCNAD00231935','JCNAD00231933','JCNAD00254567', 'JCNAD00254585','JCNAD00254608','JCNAD00254607','JCNAD00258524','JCNAD00332133','JCNAD00268618', 'JCNAD00279196','JCNAD00268613') Order bypostdatedesc

-, in the list of the in-face value, place the most frequent values at the front, least appear at the end, and reduce the number of judgements

-, when you use SELECT INTO, it locks the system tables (sysobjects,sysindexes, and so on), blocking access to other connections. Display declaration statements when creating temporary tables instead ofSelect into. Drop TableT_lxhbegin Tran Select * intoT_lxh fromChineseresumewherename= 'XYZ' --commit in another connection select * from sysobjects can see that the select into locks up the system table, and the Create table locks the system table (whether it is a temporary table or a system table). So don't use it in things!!! In this case, use a real table, or a temporary table variable, if it's a temporary table that you often use.

Panax Notoginseng, generally in group byyou can get rid of the extra lines before you do it, so try not to use them to do the elimination work. Their order of execution should be the following optimal:Selectthe WHERE clause selects all the appropriate rows,GroupBy is used to group a statistical row, the HAVING clause is used to remove unwanted groupings. This group byA having has a small cost and a quick query. The grouping of large data rows and having a very consuming resource. If the purpose of group by does not include calculations, but groups, then use distinct faster

-, one update more than one record score multiple updates each time a fast, that is, batch processing good

-, use less temporary tables, and try to replace it with result sets and table class variables .TableA variable of type is better than a temporary table

-, under SQL2000, the calculated fields are indexed and the conditions that need to be met are as follows:

A, the expression of the calculated field is OK
B, can not be used in text,Ntext, image data type
C, must be prepared with the following options Ansi_nulls= on, Ansi_paddings= on, .....

A, try to put the data processing work on the server, reduce the overhead of the network, such as using stored procedures. Stored procedures are compiled, optimized, and organized into an execution plan, and stored in the database of SQL statements, is the control of the Flow language collection, of course, fast. Dynamic SQL, which is executed repeatedly, can use temporary stored procedures, which (temporary tables) are placed in tempdb. Because SQL Server did not support complex math calculations before, it was necessary to put this work on other layers and increase the overhead of the network. SQL2000 supports UDFs, now supports complex mathematical calculations, the return value of the function is not too large, such a large overhead. User-defined functions consume a large amount of resources, like cursors, if large results are returned using stored procedures

A, do not use the same function repeatedly in a sentence, waste resources, put the result in the variable and call faster

the,SELECT COUNT(*the efficiency of teaching is low, try to adapt his writing, and exists fast. Notice the difference at the same time:Select Count(Field of NULL) from TableandSelect Count(Field of not NULL) from TableThe return value is different.

the, when the server has enough memory, the number of thread preparation=Maximum number of connections+5to maximize efficiency; otherwise, use the number of compounding threads<Maximum number of connections enable the thread pool of SQL Server to resolve, if the number=Maximum number of connections+5, which severely damages the performance of the server.

the, and visit your table in a certain order. If you lock table a first and then lock table B, then lock them in this order in all stored procedures. If you (inadvertently) lock table B in a stored procedure and then lock table A, this can cause a deadlock. Deadlock is difficult to find if the locking order is not well designed in advance detail

-, monitor the load memory:page the appropriate hardware via SQL Server Performance Monitor faults/sec counter If the value is occasionally higher, it indicates that the thread was competing for memory. If it continues to be high, memory may be a bottleneck. Process:

1,%DPC time refers to the percentage of processors that are used to receive and provide services in a deferred program call (DPC) during the sample interval. (DPC is running at a lower interval than the standard interval priority). Because DPC is performed in privileged mode, the percentage of DPC time is part of the privileged time percentage. These times are calculated separately and are not part of the total interval calculation. This total shows the average busy time as a percentage of the instance time.
2,%Processor Time counter if the value of the parameter continues to exceed%, indicating that the bottleneck is CPU. Consider adding a processor or switching to a faster processor.
3,%Privileged time refers to the percentage of idle processor times used for privileged mode. (Privileged mode is a processing pattern designed for operating system components and manipulating hardware drivers.) It allows direct access to hardware and all memory. Another mode is User mode, which is a kind of limited processing mode designed for application, environment and integer system. The operating system converts application threads into privileged mode to access operating system services. of privileged Time%includes time to provide service for intermittent and DPC. A high privileged time ratio may be caused by a large number of gaps in a failed device. This counter will display the average busy time as part of the sample time.
4,% UserTime represents CPU-consuming database operations, such as sorting, executing aggregate functions, and so on. If the value is high, consider adding an index to reduce the value by using simple table joins, and horizontally dividing the large table. PhysicalDisk: CurretnDiskThe Queue length counter should not exceed 1 of the disk count.5~twice times. To improve performance, you can increase the disk. Sqlserver:cache Hit ratio counter the higher the value the better. If it lasts less than%, you should consider increasing memory. Note the value of this parameter is cumulative since the start of SQL Server, so the value will not reflect the current value of the system after a period of time.
the, analyze Select emp_name form employeewhereSalary> 3000in this statement, if the salary is a float type, the optimizer optimizes it to convert (float,3000because 3000 is an integer, we should use 3000.0 in programming and not wait for the runtime to transform the DBMS. Conversion of the same character and integer data.





======================================================================================================


We want to do not only write SQL, but also to do a good performance of the SQL, the following for the author to learn, extract, and summarized part of the information to share with you!


(1) Select the most efficient table name order (valid only in the Rule-based optimizer):


The ORACLE parser processes the table names in the FROM clause in Right-to-left order, the last table (driving table), which is written in the FROM clause, is processed first, and in the case where multiple tables are included in the FROM clause, you must select the table with the least number of records as the underlying table. If you have more than 3 table join queries, you need to select the Crosstab table (intersection table) as the underlying table, which is the table referenced by the other tables.


(2) The order of joins in the WHERE clause. :


Oracle parses the WHERE clause in a bottom-up order, according to which the connection between the tables must be written before the other where conditions, and the conditions that can filter out the maximum number of records must be written at the end of the WHERE clause.


(3) Avoid the use of ' * ' in the SELECT clause:


Oracle converts ' * ' to all column names in the parsing process, which is done by querying the data dictionary, which means more time will be spent


(4) Reduce the number of accesses to the database:


Oracle has done a lot of work internally: Parsing SQL statements, estimating index utilization, binding variables, reading chunks, etc.


(5) Reset the arraysize parameters in Sql*plus, sql*forms and pro*c to increase the amount of retrieved data per database access, the recommended value is 200


(6) Use the Decode function to reduce processing time:


Use the Decode function to avoid repeatedly scanning the same record or repeating the same table.


(7) Simple integration, no associated database access:


If you have a few simple database query statements, you can integrate them into a single query (even if there is no relationship between them).


(8) Delete duplicate records:


The most efficient way to delete duplicate records (because of the ROWID) example:


DELETE from EMP E WHERE e.rowid &gt; (SELECT MIN (X.ROWID)


From EMP X WHERE x.emp_no = e.emp_no);


(9) Replace Delete with truncate:


When you delete records in a table, in general, the rollback segment (rollback segments) is used to hold information that can be recovered. If you do not commit a transaction, Oracle restores the data to the state before it was deleted (accurately, before the deletion was performed) and when the truncate is applied, the rollback segment no longer holds any recoverable information. When the command is run, The data cannot be recovered. Therefore, very few resources are invoked and the execution time is short. (Translator: Truncate only applies when deleting full table, truncate is DDL not DML)


(10) Use commit as much as possible:


Whenever possible, use a commit in the program as much as possible, so that the performance of the program is improved and the requirements are reduced by the resources released by the commit:


Resources released by commit:


A. The information used to recover data on the rollback segment.


B. Locks obtained by program statements


C. Space in the Redo log buffer


D. Oracle to manage the internal costs of the 3 resources mentioned above


(11) Replace the HAVING clause with the WHERE clause:


Avoid the HAVING clause, which will filter the result set only after all records have been retrieved. This processing requires sorting, totals, and so on. If you can limit the number of records through the WHERE clause, you can reduce the overhead. On, where, and having these three clauses that can be conditionally, on is the first execution, where is the second, having the last, because on is the first to filter the records that do not meet the criteria before statistics, it can reduce the intermediate operations to deal with the data, It is supposed to be the fastest, where it should be faster than having, because it filters the data before the sum is used on the two table joins, so there is a table where there is a comparison with having. In the case of this single table query statistic, if the condition to be filtered does not involve the calculation of fields, then their result is the same, where the Rushmore technique is used, and the having is not, the slower the latter is slow if it involves a calculated field, it means that before the calculation, The value of this field is indeterminate, according to the workflow in the previous article, where the action time is done before the calculation, and the having is only after the calculation, so in this case the results will be different. On a multiple table join query, on has an earlier effect than where. First, the system is based on the join conditions between the tables, a number of tables into a temporary table, and then filtered by the where, and then calculated, after the calculation by having to filter. Therefore, to filter the conditions to play a correct role, first of all to understand how this condition should work, and then decide to put it there


(12) Reduce the query to the table:


In SQL statements that contain subqueries, you should pay special attention to reducing the query to the table. Example:


Select Tab_name from TABLES WHERE (tab_name,db_ver) = (select


Tab_name,db_ver from tab_columns WHERE VERSION = 604)


(13) Improve SQL efficiency through internal functions:


Complex SQL often sacrifices execution efficiency. It is very meaningful to master the application of the above function to solve the problem in practical work.


(14) Using table aliases (alias):


When you connect multiple tables in an SQL statement, use the alias of the table and prefix the alias with each column. This allows you to reduce parsing time and reduce syntax errors caused by column ambiguity.


(15) substituting exists instead of in, using not exists instead of in:


In many queries based on the underlying table, it is often necessary to join another table in order to satisfy one condition. In this case, using EXISTS (or not EXISTS) usually increases the efficiency of the query. In a subquery, the NOT IN clause performs an internal sort and merge. In either case, not in is the least efficient (because it performs a full table traversal of the table in the subquery). In order to avoid using not in, we can rewrite it as an outer join (Outer joins) or not EXISTS.


Example:


(efficient) SELECT * from EMP (base table) where EMPNO &gt; 0 and EXISTS (select ' X ' from DEPT where DEPT. DEPTNO = EMP. DEPTNO and LOC = ' Melb ')


(inefficient) SELECT * from EMP (base table) where EMPNO &gt; 0 and DEPTNO in (select DEPTNO from DEPT where LOC = ' Melb ')


(16) Identify the ' inefficient execution ' of the SQL statement:


While there are many graphical tools for SQL optimization at the moment, writing your own SQL tools to solve problems is always the best approach:


SELECT executions, disk_reads, Buffer_gets,


ROUND ((buffer_gets-disk_reads)/buffer_gets,2) Hit_radio,


ROUND (disk_reads/executions,2) Reads_per_run,


Sql_text


From V$sqlarea


WHERE executions&gt;0


and buffer_gets &gt; 0


and (Buffer_gets-disk_reads)/buffer_gets &lt; 0.8


Order by 4 DESC;

(17) Use Index to improve efficiency:


An index is a conceptual part of a table that is used to improve the efficiency of retrieving data, and Oracle uses a complex, b-tree structure. In general, querying data through an index is faster than full table scans. The Oracle optimizer uses indexes when Oracle finds the best path to execute queries and UPDATE statements. It also increases efficiency when you use indexes to join multiple tables. Another advantage of using the index is that it provides uniqueness validation for the primary key (primary key) ... Those long or long raw data types, you can index almost all columns. In general, using indexes in large tables is particularly effective. Of course, you'll also find that using indexes can also improve efficiency when scanning small tables. Although the use of indexes can improve query efficiency, we must also pay attention to its cost. Indexes require space for storage and regular maintenance, and the index itself is modified whenever a record is added or subtracted from the table or the index column is modified. This means that each record's insert, DELETE, and update will pay 4, 5 more disk I/O. Because indexes require additional storage space and processing, those unnecessary indexes can slow down query response times. It is necessary to periodically refactor the index.:


ALTER INDEX &lt;INDEXNAME&gt; REBUILD &lt;TABLESPACENAME&gt;


18) Replace distinct with exists:


Avoid using DISTINCT in the SELECT clause when submitting a query that contains a one-to-many table of information, such as a department table and an employee table. It is generally possible to consider replacing with exist, EXISTS make the query faster, because the RDBMS core module returns the result immediately after the condition of the subquery is satisfied. Example:


(inefficient):


SELECT DISTINCT dept_no,dept_name from DEPT D, EMP E


WHERE d.dept_no = E.dept_no


(efficient):


Select Dept_no,dept_name from DEPT D WHERE EXISTS (select ' X '


From EMP E WHERE e.dept_no = d.dept_no);


The SQL statement is capitalized, because Oracle always parses the SQL statement first, converts lowercase letters to uppercase, and then executes


(20) in Java code as little as possible with the connector "+" connection string!


(21) Avoid using not usually on indexed columns,


We want to avoid using not on indexed columns, not the same effect as using functions on indexed columns. When Oracle "encounters" not, he stops using the index instead of performing a full table scan.


(22) Avoid using calculations on indexed columns.


In the WHERE clause, if the indexed column is part of the function. The optimizer uses a full table scan without indexing.


Example:


Low efficiency:


SELECT ... From DEPT WHERE SAL * &gt; 25000;


Efficient:


SELECT ... From DEPT WHERE SAL &gt; 25000/12;


(23) replacing &gt; with &gt;=


Efficient:


SELECT * from EMP WHERE DEPTNO &gt;=4


Low efficiency:


SELECT * from EMP WHERE DEPTNO &gt;3


The difference is that the former DBMS will jump directly to the first dept equals 4 and the latter will first navigate to the Deptno=3 record and scan forward to the first dept greater than 3 records.


(24) Replace with union or (applicable to indexed columns)


In general, replacing or in a WHERE clause with union would have a better effect. Using or for an indexed column causes a full table scan. Note that the above rules are valid only for multiple indexed columns. If a column is not indexed, query efficiency may be reduced because you have not selected an OR. In the following example, indexes are built on both loc_id and region.


Efficient:


SELECT loc_id, Loc_desc, REGION


From LOCATION


WHERE loc_id = 10


UNION


SELECT loc_id, Loc_desc, REGION


From LOCATION


WHERE REGION = "MELBOURNE"


Low efficiency:


SELECT loc_id, Loc_desc, REGION


From LOCATION


WHERE loc_id = ten OR REGION = "MELBOURNE"


If you insist on using or, you need to return the least recorded index column to the front.


(25) to replace or with in


This is a simple and easy to remember rule, but the actual execution effect also needs to examine, under Oracle8i, the two execution path seems to be the same.


Low efficiency:


SELECT .... From LOCATION WHERE loc_id = ten or loc_id = 30


Efficient


SELECT ... From LOCATION WHERE loc_in in (10,20,30);


(26) Avoid using is null and is not NULL on indexed columns


To avoid using any nullable columns in the index, Oracle will not be able to use the index. For Single-column indexes, this record will not exist in the index if the column contains a null value. For composite indexes, if each column is empty, the record also does not exist in the index. If at least one column is not empty, the record exists in the index. For example: If the uniqueness index is based on the columns A and B of the table, and the a,b value of a record exists in the table (123,null), Oracle will not accept the next record (insert) with the same a,b value (123,null). However, if all the indexed columns are empty, Oracle will assume that the entire key value is empty and empty is not equal to NULL. So you can insert 1000 records with the same key value, of course they're all empty! Because a null value does not exist in an indexed column, a null comparison of an indexed column in a WHERE clause causes Oracle to deactivate the index.


Inefficient: (Index invalidated)


SELECT ... From DEPARTMENT WHERE dept_code are not NULL;


Efficient: (Index valid)


SELECT ... From DEPARTMENT WHERE Dept_code &gt;=0;


(27) Always use the first column of the index:


If the index is based on multiple columns, the optimizer chooses to use the index only if its first column (leading column) is referenced by the WHERE clause. This is also a simple and important rule that when only the second column of the index is referenced, the optimizer uses a full table scan and ignores the index


28 Replace union with Union-all (if possible):


When the SQL statement requires a union of two query result sets, the two result sets are merged in a union-all manner, and then sorted before outputting the final result. If you use UNION ALL instead of union, this sort is not necessary. Efficiency will be improved accordingly. It should be noted that UNION all outputs the same record in the two result sets repeatedly. So you still need to analyze the feasibility of using union all from the business requirements analysis. The UNION will sort the result set, which will use the memory of the Sort_area_size block. The optimization of this block of memory is also very important. The following SQL can be used to query for sorted consumption


Low efficiency:


SELECT Acct_num, Balance_amt


From Debit_transactions


WHERE tran_date = ' 31-dec-95 '


UNION


SELECT Acct_num, Balance_amt


From Debit_transactions


WHERE tran_date = ' 31-dec-95 '


Efficient:


SELECT Acct_num, Balance_amt


From Debit_transactions


WHERE tran_date = ' 31-dec-95 '


UNION All


SELECT Acct_num, Balance_amt


From Debit_transactions


WHERE tran_date = ' 31-dec-95 '


(29) Where to replace order by:


The ORDER BY clause uses the index only under two strict conditions.


All columns in the order by must be contained in the same index and remain in the index.


All columns in the order by must be defined as non-null.


The index used in the WHERE clause and the index used in the ORDER BY clause cannot be tied.


For example:


The table dept contains the following:


Dept_code PK not NULL


Dept_desc not NULL


Dept_type NULL


Inefficient: (index not used)


SELECT Dept_code from DEPT order by Dept_type


Efficient: (using index)


SELECT Dept_code from DEPT WHERE dept_type &gt; 0


(30) Avoid changing the type of the indexed column.:


Oracle automatically makes simple type conversions to columns when comparing data of different data types.


Suppose Empno is an indexed column of a numeric type.


SELECT ... From EMP WHERE EMPNO = ' 123 '


In fact, after Oracle type conversion, the statement translates to:


SELECT ... From EMP WHERE EMPNO = to_number (' 123 ')


Fortunately, the type conversion did not occur on the index column, and the purpose of the index was not changed.


Now, suppose Emp_type is an indexed column of character type.


SELECT ... From EMP WHERE Emp_type = 123


This statement is converted by Oracle to:


SELECT ... From EMP Whereto_number (emp_type) =123


This index will not be used because of a type conversion occurring internally! To avoid an implicit type conversion of Oracle to your SQL, it is a good idea to explicitly display the type conversion. Note that when characters and numeric comparisons are compared, Oracle converts numeric types to character types preferentially


(31) The WHERE clause needs to be careful:


The WHERE clause in some SELECT statements does not use an index. Here are some examples.


In the following example, (1) '!= ' will not use the index. Remember, an index can only tell you what exists in the table, not what doesn't exist in the table. (2) ' | | ' is a character join function. As with other functions, the index is deactivated. (3) ' + ' is a mathematical function. As with other mathematical functions, the index is deactivated. (4) The same indexed columns cannot be compared to each other, which will enable full table scans.


A. The number of records in a table that retrieves more than 30% of the data. Using indexes will have no significant efficiency gains.


B. In certain situations, using an index may be slower than full table scans, but this is the same order of magnitude difference. In general, using an index is a few times or even thousands of times times more than a full table scan!


(33) Avoid using resource-consuming operations:


SQL statement with Distinct,union,minus,intersect,order by will start the SQL engine


Perform a resource-intensive sort (sort) function. Distinct requires a sort operation, while the other requires at least two times to perform the sort. Typically, SQL statements with union, minus, and intersect can be overridden in other ways. If your database sort_area_size well, use union, minus, intersect can also be considered, after all, their readability is very strong


(34) Optimize GROUP by:


Increase the efficiency of the group BY statement by filtering out unwanted records before group by. The following two queries return the same result but the second one is obviously much faster.


Low efficiency:

SELECT JOB, AVG (SAL) from 
EMP 
GROUP by job has 
job = ' PRESIDENT ' 


Efficient:


SELECT JOB, AVG (SAL) from 
EMP 
WHERE job = ' PRESIDENT ' 
OR job = ' MANAGER ' 
GROUP by JOB


Optimizing SQL queries: How to write High-performance SQL statements



1, first to understand what is called the implementation plan?



The execution plan is a query scheme that the database makes based on the statistics of the SQL statements and related tables. This scenario is generated by the query optimizer's automated analysis, such as an SQL statement that, if used to look up 1 records from a 100,000-record table, the query optimizer chooses the "Index lookup" method, If the table is archived and only 5,000 records are currently left, the query optimizer will change the scenario with a "full table scan" approach.



Visible, the execution plan is not fixed, it is "personalized." It is important to produce a correct "execution plan" with two points:



(1) Does the SQL statement tell the query optimizer clearly what does it want to do?
(2) is the database statistics obtained by the query optimizer up-to-date and correct?



2, the unified SQL statement of the wording



For the following two SQL statements, the programmer thinks it is the same, and the database query optimizer considers it different.





Copy Code code as follows:

SELECT * FROM dual
SELECT * FROM dual





In fact, the case is different, Query Analyzer is considered to be two different SQL statements, you must do two times parsing. Generate 2 execution plans. So as a programmer, you should ensure that the same query statements are consistent anywhere, no more than one space!



3, do not write the SQL statement too complex



I often see a SQL statement captured from a database that prints 2 sheets of A4 paper so long. In general, such complex statements are usually problematic. I took this 2-page long SQL statement to consult the original author, and he said that the time is too long, he can not understand the moment. It is conceivable that even the original author is likely to read the confused SQL statements, the database will look confused.



Generally, this nested statement is more common than a subset of the results of a SELECT statement and then queried from that subset, but the query optimizer can easily give the wrong execution plan, based on experience, with more than 3 layers of nesting. Because it's getting dizzy. such as this kind of artificial intelligence, after all, than the resolution of people, if people are dizzy, I can guarantee that the database will faint.



In addition, execution plans can be reused, and the simpler the SQL statements are more likely to be reused. A complex SQL statement must be parsed as soon as one character changes, and then the pile of garbage is plugged into memory. It is conceivable that the efficiency of the database is how low.



4, the use of "temporary table" staging intermediate results



An important way to simplify SQL statements is to use temporary tables to temporarily store intermediate results. However, the benefits of temporary tables are much more than these, temporary results are temporarily temporary tables, the subsequent query in tempdb, which avoids multiple scans of the main table in the program, but also greatly reduces the program in the execution of "shared lock" blocking "update lock", Reduce congestion and improve concurrency performance.



5. The OLTP system SQL statement must take the binding variable





Copy Code code as follows:

SELECT * from OrderHeader where changetime > ' 2010-10-20 00:00:01 '
SELECT * from OrderHeader where changetime > ' 2010-09-22 00:00:01 '





The above two statements, which the query optimizer considers to be different SQL statements, need to be parsed two times. If you take a bound variable



SELECT * from OrderHeader where changetime > @chgtime @chgtime variable can pass in any value, so a large number of similar queries can reuse the execution plan, which can greatly reduce the burden of database parsing SQL statements. One resolution, multiple reuse, is to improve the efficiency of the database principle.



6. Binding variable Spy



There are two sides to everything, and binding variables are appropriate for most OLTP processing, but there are exceptions. For example, when the field in the Where condition is "italic field."



"Italic field" means that most of the values in the column are the same, such as a population survey, in which the list of "nationalities", more than 90%, is Han. So if an SQL statement is to inquire about the number of the 30-Year-old Han population, the "nation" list is bound to be placed in a where condition. This time if the binding variable @nation will have a big problem.



Imagine that if the first value that @nation passed in was "Han", the entire execution plan would inevitably choose a table scan. Then, the second value into the "Buyi", the proportion of "Buyi" is probably only one out of 10,000, should be indexed to find. However, as the first parsing of the "Han" execution plan is reused, table scanning is also used for the second time. This problem is known as "binding variable Spy", and it is recommended that you do not use bound variables for italic fields.



7. Use the BEGIN TRAN only if necessary



A SQL statement in SQL Server defaults to a transaction, which is also the default commit after the statement is executed. In fact, this is a minimized form of the BEGIN Tran, like a BEGIN TRAN implied at the beginning of each statement, and a commit is implied at the end.



In some cases, we need to explicitly declare the BEGIN TRAN, such as "Insert, delete, change" operations need to modify several tables at the same time, require that either of the tables are modified or unsuccessful. Begin Tran can do this by putting several SQL statements together to execute and then commit together. The benefit is to keep the data consistent, but nothing is perfect. The cost of the Begin Tran is that all the resources locked by the SQL statements will not be released until the commit is committed.



As you can see, if there are too many SQL statements nested at BEGIN Tran, the performance of the database is bad. Before the large transaction is committed, it will inevitably block other statements, resulting in a lot of blocks.



The principle of Begin Tran is that, in the premise of ensuring data consistency, the fewer SQL statements that begin TRAN to hold, the better! In some cases, triggers can be used to synchronize data, not necessarily with begin TRAN.



8, some SQL query statements should be added Nolock



Adding NOLOCK to SQL statements is an important means of improving SQL Server concurrency, which is not necessary in Oracle because Oracle is more structured and has the undo tablespace to hold "data shadows," if the data is not yet commit in the modification, So what you're reading is a copy of it before it's modified, which is placed in the undo table space. In this way, Oracle's reading and writing can be mutually exclusive, which is also widely praised by Oracle. SQL Server read and write are blocking each other, in order to improve concurrency performance, for some queries, you can add nolock, this can be read to allow writing, but the disadvantage is that you may read the uncommitted dirty data. There are 3 principles for using NOLOCK.



(1) The results of the query for "plug, delete, change" Can not add nolock!
(2) The table of inquiry is frequent occurrence page splits, careful use Nolock!
(3) The use of temporary tables can save "data before", play a similar function of Oracle's undo table space,



can use temporary table to improve concurrent performance, do not use NOLOCK.



9, the clustered index is not built on the table's Order field, the table is prone to page splitting



For example, order form, have order number OrderID, also have customer number ContactID, then the clustered index should add in which field? For this table, the order number is added sequentially, and if you add a clustered index to the OrderID, the new rows are added at the end, so it is not easy to often produce page splits. However, since most queries are based on the customer number, it makes sense to add a clustered index to the ContactID. ContactID, however, is not a sequential field for order tables.



For example, "John" of "ContactID" is 001, then "John" order information must be placed on the first page of this table, if today "John" a new order, then the order information can not be placed on the last page of the table, but the first page! What if the first page is full? I'm sorry, all the data on the table will be moved back to the record space.



The index of SQL Server is different from that of Oracle, and the clustered index of SQL Server actually sorts the table by the order of the clustered index fields, which is equivalent to the Index organization table of Oracle. The clustered index of SQL Server is an organizational form of the table itself, so its efficiency is very high. Also because of this, insert a record, its position is not casually put, but to be placed in the order of the data page, if the data page has no space, it caused the page split. So obviously, the clustered index is not built on the order field of the table, and the table is prone to page splitting.



Once there was a situation where a buddy of a table. After rebuilding the index, the efficiency of the insertion is greatly reduced. Presumably this is the case. The table's clustered index may not be built on the Order field of the table, which is often archived, so the table's data exists in a sparse state. For example, John 20 orders, and the last 3 months only 5 orders, the archiving strategy is to keep 3 months of data, then John past 15 orders have been archived, leaving 15 vacancies, can be used in the event of insert reuse. In this case, page splitting is not possible because there is a vacancy available. However, query performance is low because queries must scan for empty spaces that have no data.



After rebuilding the clustered index, the situation changes since the rebuild of the clustered index is to rearrange the data in the table, the original vacancy is not available, and the page fill rate is very high, the insertion data will often occur page splitting, so the performance is greatly reduced.



Do you want to give a lower page fill rate for a table that is not built on a sequential field for a clustered index? Do you want to avoid rebuilding the clustered index? is a question worth considering!



10, add nolock after the query often occurs page split table, easy to produce skip or read repeatedly



After adding NOLOCK, you can insert, delete, change "at the same time to query, but because of the simultaneous" plug, delete, change ", in some cases, once the data page is full, then page splitting inevitable, and at this time Nolock query is taking place, such as on the 100th page has been read the record, It is possible to split the page to page 101th, which may cause the NOLOCK query to read the data repeatedly when it reads 101 pages, resulting in a "repeat read". Similarly, if the data on the 100 page has not been read to the 99 page, the NOLOCK query may have missed the record, resulting in "jump read."



The above mentioned buddies, after adding nolock some of the operation of the error, it is estimated that because of the NOLOCK query produced a repeat read, 2 of the same record to insert another table, of course, will occur primary key conflict.



11, use like for fuzzy query should pay attention to



Sometimes you'll need to do some fuzzy queries like
SELECT * from contacts where username like '%yue% '



Keyword%yue%, because the Yue used in front of the "%", so the query is bound to go full table scan, unless necessary, otherwise not in the keyword before adding%,



12. The effect of implicit conversion of data types on query efficiency



SQL server2000 database, our program when submitting SQL statements, does not use the strong type to commit the value of this field, the data type is automatically converted by SQL Server 2000, resulting in the incoming parameters inconsistent with the primary key field type, when SQL Server 2000 The full table scan may be used. The problem was not found on the Sql2005, but it should be noted.



13, SQL Server table connection three ways



(1) Merge Join
(2) Nested Loop Join
(3) Hash Join



SQL Server 2000 has only one join way--nested Loop join, if a result set is small, that is default as the appearance, each record in a is scanned in B, the actual number of rows swept equivalent to a result set row number x B result set rows. So if the two result sets are large, the result of the join IS bad.



SQL Server 2005 adds a merge Join, if the Join field for table A and table B is exactly the same as the clustered index, the order of the tables is lined up, as long as both sides are spelled, and the cost of the join is equal to the number of rows in the result set of table A plus the number of rows of the result set B, plus, One is multiply, and the result of the merge join is much better than the nested Loop join.



If there is no index on the connected field, the efficiency of the SQL2000 is fairly low, and SQL2005 provides a hash join, which is equivalent to indexing the result set of the A,b table temporarily, so the efficiency of SQL2005 is significantly higher than that of SQL2000, which I think is an important reason.



To sum up, be aware of the following points when connecting tables:



(1) Join fields try to select the field where the clustered index is located
(2) Carefully consider where conditions, minimizing the result set of table A and B
(3) If many join connection fields are missing the index, and you are still using SQL Server 2000, quickly upgrade it.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.