Ways to improve query speed with data above millions:
1. Try to avoid using the! = or <> operator in the WHERE clause, or discard the engine for a full table scan using the index.
2. To optimize the query, avoid full-table scanning as far as possible, and first consider establishing an index on the columns involved in the Where and order by.
3. Avoid null-valued fields in the WHERE clause, which will cause the engine to discard full-table scans using the index, such as:
Select ID from t where num is null
You can set the default value of 0 on NUM, make sure that the NUM column in the table does not have a null value, and then query:
Select ID from t where num=0
4. You should try to avoid using or in the WHERE clause to join the condition, otherwise it will cause the engine to abandon using the index for a full table scan, such as:
Select ID from t where num=10 or num=20
You can query this:
Select ID from t where num=10
UNION ALL
Select ID from t where num=20
5. The following query will also result in a full table scan: (no preceding percent sign)
Select ID from t where name like '%abc% '
To be more efficient, consider full-text indexing.
6.in and not in should also be used with caution, otherwise it will result in full table scans, such as:
Select ID from t where num in
For consecutive values, you can use between instead of in:
Select ID from t where num between 1 and 3
8. You should try to avoid expression operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:
Select ID from t where num/2=100
should read:
Select ID from t where num=100*2
9. You should try to avoid function operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:
Select ID from t where substring (name,1,3) = ' abc ' –name ID starting with ABC
Select ID from t where DATEDIFF (day,createdate, ' 2005-11-30′) =0– ' 2005-11-30′ generated ID
should read:
Select ID from t where name like ' abc% '
Select ID from t where createdate>= ' 2005-11-30′and createdate< ' 2005-12-1′
10. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the WHERE clause, or the index may not be used correctly by the system.
11. When using an indexed field as a condition, if the index is a composite index, you must use the first field in the index as a condition to guarantee that the system uses the index, otherwise the index will not be used, and the field order should be consistent with the index order as much as possible.
12. Do not write meaningless queries, such as the need to generate an empty table structure:
Select Col1,col2 into #t from T where 1=0
This type of code does not return any result sets, but consumes system resources and should be changed to this:
CREATE TABLE #t (...)
13. It is a good choice to replace in with exists in many cases:
Select num from a where num in (select num from B)
Replace with the following statement:
Select num from a where exists (select 1 from b where num=a.num)
14. Not all indexes are valid for queries, SQL is query-optimized based on data in the table, and when there is a large amount of data duplication in the index columns, SQL queries may not take advantage of the index, as there are fields in the table Sex,male, female almost half, So even if you build an index on sex, it doesn't work for query efficiency.
15. The index is not the more the better, although the index can improve the efficiency of the corresponding select, but also reduce the efficiency of insert and UPDATE, because the INSERT or update when the index may be rebuilt, so how to build the index needs careful consideration, depending on the situation. The number of indexes on a table should not be more than 6, if too many you should consider whether some of the indexes that are not commonly used are necessary.
16. You should avoid updating clustered index data columns as much as possible, because the order of the clustered index data columns is the physical storage order of the table records, which can consume considerable resources once the column values change to the order in which the entire table is recorded. If your application needs to update clustered index data columns frequently, you need to consider whether the index should be built as a clustered index.
17. Use numeric fields as much as possible, if the field containing only numeric information should not be designed as a character type, which will reduce the performance of queries and connections and increase storage overhead. This is because the engine compares each character in a string one at a time while processing queries and joins, and it is sufficient for a numeric type to be compared only once.
18. Use Varchar/nvarchar instead of Char/nchar as much as possible, because the first variable length field storage space is small, can save storage space, second, for the query, in a relatively small field in the search efficiency is obviously higher.
19. Do not use SELECT * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not available.
20. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, be aware that the index is very limited (only the primary key index).
21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.
22. Temporary tables are not unusable, and they can be used appropriately to make certain routines more efficient, for example, when you need to repeatedly reference a dataset in a large table or a common table. However, for one-time events, it is best to use an export table.
23. When creating a temporary table, if you insert a large amount of data at one time, you can use SELECT INTO instead of CREATE table to avoid causing a large number of logs to increase speed, and if the amount of data is small, create table to mitigate the resources of the system tables. Then insert.
24. If a temporary table is used, be sure to explicitly delete all temporary tables at the end of the stored procedure, TRUNCATE table first, and then drop table, which avoids longer locking of the system tables.
25. Avoid using cursors as much as possible, because cursors are inefficient and should be considered for overwriting if the cursor is manipulating more than 10,000 rows of data.
26. Before using a cursor-based method or temporal table method, you should first look for a set-based solution to solve the problem, and the set-based approach is generally more efficient.
27. As with temporary tables, cursors are not unusable. Using Fast_forward cursors on small datasets is often preferable to other progressive processing methods, especially if you must reference several tables to obtain the required data. Routines that include "totals" in the result set are typically faster than using cursors. If development time permits, a cursor-based approach and a set-based approach can all be tried to see which method works better.
28. Set NOCOUNT on at the beginning of all stored procedures and triggers, set NOCOUNT OFF at the end. You do not need to send a DONE_IN_PROC message to the client after each statement that executes the stored procedure and trigger.
29. Try to avoid the return of large data to the client, if the amount of data is too large, should consider whether the corresponding demand is reasonable.
30. Try to avoid large transaction operation and improve the system concurrency ability.
The reason for the slow query speed:
1, no index or no index (this is the most common problem of slow query, is the defect of program design)
2, I/O throughput is small, forming a bottleneck effect.
3. No computed columns are created, resulting in queries not being optimized.
4. Insufficient memory
5. Slow network speed
6, the amount of data queried is too large (can use multiple queries, other methods to reduce the amount of data)
7, lock or deadlock (this is also the most common problem of slow query, is the defect of program design)
8, sp_lock,sp_who, the activity of the user view, the reason is to read and write competitive resources.
9. Return unnecessary rows and columns
10, query statement is not good, no optimization
You can refine the query by using the following methods
1, put the data, logs, indexes on different I/O devices, increase the read speed, previously can be tempdb should be placed on the RAID0, SQL2000 is not supported. The larger the amount of data (size), the more important it is to increase I/O.
2. Vertical and horizontal partition table, reduce the size of the table (Sp_spaceuse)
3. Upgrading hardware
4, according to the query criteria, index, optimize the index, optimize access mode, limit the data volume of the result set. Note that the fill factor is appropriate (preferably using the default value of 0). The index should be as small as possible, using a Lie Jian index with a small number of bytes (refer to the creation of the index), do not Jianjian a single index on a limited number of values such as the gender field
5, improve speed;
6, expand the memory of the server, Windows 2000 and SQL Server 2000 can support 4-8g memory. Configure virtual Memory: The virtual memory size should be configured based on the services that are running concurrently on the computer. Run Microsoft SQL Server? 2000, consider setting the virtual memory size to 1.5 times times the physical memory installed on your computer. If you have additional full-text search features installed and you plan to run the Microsoft search service to perform full-text indexing and querying, consider: Configure the virtual memory size to be at least 3 times times the physical memory installed on the computer. Configure the SQL Server max server memory server configuration option to 1.5 times times the physical memory (half of the virtual memory size setting).
7. Increase the number of server CPUs, but it is important to understand that parallel processing of serial processing requires resources such as memory. The use of parallel or string travel is the MSSQL automatic evaluation option. A single task is decomposed into multiple tasks and can be run on the processor. For example, delays in sorting, connecting, scanning, and group by words are performed simultaneously, and SQL Server determines the optimal level of parallelism based on the load of the system, and complex queries that consume large amounts of CPU are best suited for parallel processing. However, the update operation is Update,insert, and delete cannot be processed in parallel.
8, if you use like to query, simple to use index is not, but the full-text index, consumption of space. Like ' a% ' uses the index like '%a ' when querying with like '%a% ' without an index, the query time is proportional to the total length of the field value, so the char type is not used, but varchar. The full-text index is long for the value of the field.
9. Separation of DB server and application server; OLTP and OLAP separation
10. A distributed partitioned view can be used to implement a federation of database servers. A consortium is a set of servers that are managed separately, but they work together to share the processing load of the system. This mechanism of forming a federation of database servers through partitioned data can expand a set of servers to support the processing needs of large, multi-tiered Web sites. For more information, see Designing federated database servers. (Refer to SQL Help file ' partitioned view ')
A, before implementing a partitioned view, you must first horizontally partition the table
b, after creating the member table, define a distributed partitioned view on each member server, and each view has the same name. This enables queries that reference the distributed partitioned view name to run on any member server. The system operates as if each member server has a copy of the original table, but there is only one member table and one distributed partitioned view on each server. The location of the data is transparent to the application.
11. Rebuild the index DBCC REINDEX, DBCC INDEXDEFRAG, shrink data and log DBCC SHRINKDB,DBCC shrinkfile. Sets the auto-shrink log. For large databases do not set the database autogrow, it will degrade the performance of the server. There's a lot of emphasis on T-SQL, and here's a list of common points: first, the DBMS processes the query plan:
1. Lexical and grammatical checking of query statements
2. Query optimizer to submit statements to the DBMS
3 optimization of optimized algebra and access paths
4. Generate query plan by precompiled module
5, and then at the appropriate time to submit to the system processing execution
6, finally return the execution result to the user next, look at the SQL Server data storage structure: A page size of 8K (8060) bytes, 8 pages for a disk area, according to B-Tree storage.
12. The difference between commit and Rollback Rollback: Roll back all things. Commit: Commit the current thing. There is no need to write things in dynamic SQL, if you want to write please write outside such as: Begin TRAN EXEC (@s) commit trans or write dynamic SQL as a function or stored procedure.
13, in the query SELECT statement using the WHERE clause to limit the number of rows returned, avoid table scan, if the return of unnecessary data, wasted the server's I/O resources, aggravating the burden of the network to reduce performance. If the table is large, locks the table during the table scan and prevents other joins from accessing the table, with serious consequences.
14. The SQL Comment Statement has no effect on execution
15, as far as possible without using cursors, it occupies a large number of resources. If you need to execute row-by-row, try to use non-cursor technology, such as: In the client loop, with temporary tables, table variables, subqueries, with case statements and so on. Cursors can be categorized according to the extraction options it supports: forward-only the rows must be fetched in the order from the first row to the last row. Fetch NEXT is the only allowed fetch operation and is the default. Scrollable can randomly fetch any row anywhere in the cursor. The technique of cursors becomes very powerful under SQL2000, and his purpose is to support loops.
There are four concurrency options
READ_ONLY: The cursor is not allowed to locate updates (update), and there is no lock in the row that makes up the result set.
Optimistic with ValueS: Optimistic concurrency control is a standard part of transaction control theory. Optimistic concurrency control is used in situations where there is only a small chance for a second user to update a row in the interval between opening the cursor and updating the row. When a cursor is opened with this option, there is no lock to control the rows in it, which will help maximize its processing power. If the user attempts to modify a row, the current value of this row is compared with the value obtained when the row was last fetched. If any value changes, the server will know that the other person has updated the row and will return an error. If the value is the same, the server executes the modification. Select this concurrency option optimistic with row VERSIONING: This optimistic concurrency control option is based on row versioning. With row versioning, the table must have some version identifier that the server can use to determine whether the row has changed after it has been read into the cursor.
In SQL Server, this performance is provided by the timestamp data type, which is a binary number that represents the relative order in which changes are made in the database. Each database has a global current timestamp value: @ @DBTS. Each time a row with a timestamp column is changed in any way, SQL Server stores the current @ @DBTS value in the timestamp column, and then increases the value of the @ @DBTS. If a table has a timestamp column, the timestamp is recorded at the row level. The server can compare the current timestamp value of a row with the timestamp value stored at the last fetch to determine whether the row has been updated. The server does not have to compare the values of all columns, just compare the timestamp columns. If an application requires optimistic concurrency based on row versioning for tables that do not have timestamp columns, Reise considers optimistic concurrency control based on numeric values.
SCROLL LOCKS This option for pessimistic concurrency control. In pessimistic concurrency control, when a row of a database is read into a cursor result set, the application attempts to lock the database row. When a server cursor is used, an update lock is placed on the row when it is read into the cursor. If a cursor is opened within a transaction, the transaction update lock is persisted until the transaction is committed or rolled back, and the cursor lock is dropped when the next row is fetched. If you open a cursor outside of a transaction, the lock is discarded when the next row is fetched. Therefore, each time a user needs full pessimistic concurrency control, the cursor should open within the transaction. An update lock prevents any other task from acquiring an update lock or exclusive lock, preventing other tasks from updating the row.
However, updating a lock does not prevent a shared lock, so it does not prevent other tasks from reading the row unless the second task also requires a read with an update lock. Scroll locks These cursor concurrency options can generate scroll locks based on the lock hints specified in the SELECT statement defined by the cursor. The scroll lock is fetched on each line at fetch and remains until the next fetch or the cursor closes, whichever occurs first. The next time the fetch occurs, the server acquires a scroll lock for the row in the new fetch and releases the last scroll lock to fetch rows. A scroll lock is independent of the transaction lock and can be persisted after a commit or rollback operation. If the option to close the cursor at commit is off, the commit statement does not close any open cursors, and the scroll lock is persisted to the commit to maintain isolation of the extracted data. The type of scroll lock acquired depends on the cursor concurrency option and the lock hint in the cursor SELECT statement.
Lock prompt read-only optimistic numeric optimistic row version control lock silent unlocked unlocked unlocked not locked update NOLOCK unlocked unlocked unlocked unlocked HOLDLOCK shared share share update UPDLOCK Error Update update TABLOCKX error does not lock unlocked update other unlocked unlocked Unlocked update * Specifies that the NOLOCK hint will make the table specified for the hint in cursor is only Read the.
16, use Profiler to track the query, get the time required to query, find out the problem of SQL; optimizing indexes with the index optimizer
17. Note the difference between Union and union all. UNION All good
18, pay attention to using distinct, do not use when not necessary, it will make the query slower than the union. Duplicate records are not a problem in the query.
19. Do not return rows or columns that are not required when querying
20. Use sp_configure ' query governor cost limit ' or set Query_governor_cost_limit to limit the resources consumed by the query. When an estimate query consumes more resources than the limit, the server automatically cancels the query and kills it before the query. Set Locktime Setting the lock time
21, use select Top 100/10 Percent to limit the number of rows returned by the user or set rowcount to limit the operation of the row
22, before SQL2000, generally do not use the following words "is NULL", "<>", "! =", "!>", "! < "," not "," not EXISTS "," Not in "," Not like ", and" like '%500 ' ", because they do not go index is all a table scan.
Also do not add functions, such as convert,substring, in the WHERE clause, if you must use a function, create a computed column and then create an index instead. You can also work around: WHERE SUBSTRING (firstname,1,1) = ' m ' Instead of where FirstName like ' m% ' (index Scan), be sure to separate the function from the column name. And the index cannot be built too much and too large.
The not-in will scan the table multiple times, using EXISTS, not EXISTS, in, and left OUTER joins instead, in particular, with the right-hand connection, while EXISTS is faster than in, and the slowest is the not operation. If the value of the column contains null, it was previously indexed function, now 2000 of the optimizer can handle it. The same is NULL, ' not ', ' not EXISTS ', ' not ' to optimize her, and ' <> ' cannot be optimized, not indexed.
23. Use Query Analyzer to view the SQL statement's query plan and evaluate whether the analysis is an optimized SQL. The average 20% of the code occupies 80% of the resources, and the focus of our optimization is these slow places.
24, if you use in or OR and so on to find that the query does not go index, use the display declaration to specify the index: SELECT * from personmember (index = ix_title) WHERE ProcessID in (' Male ', ' female ')
25, will need to query the results of pre-calculated to put in the table, query time and then select. This is the most important means before SQL7.0. For example, the hospital's hospitalization fee calculation.
26, MIN () and MAX () can use the appropriate index
27, the database has a principle is the code closer to the better, so the priority to choose Default, in turn, Rules,triggers, Constraint (constraints such as external health main health checkunique ..., the maximum length of data type, etc. are constraints), Procedure. This not only makes maintenance work small, it writes programs with high quality, and executes faster.
28, if you want to insert a large binary value into the image column, using stored procedures, do not use inline insert to insert (do not know whether Java). Because the application first converts the binary value to a string (twice times its size), the server receives the character and converts it to a binary value. The stored procedure does not have these actions: Method: Create procedure P_insert as insert into Table (Fimage) VALUES (@image), which call this stored procedure in the foreground to pass in binary parameters, significantly improves processing speed.
29, between at some point faster than in, between can quickly find the range based on the index. The difference is visible with the query optimizer. SELECT * from Chineseresume where title in (' Male ', ' female ') select * from Chineseresume where between ' Men ' and ' women ' are the same. Because in will be compared several times, it is sometimes slower.
30, it is necessary to create indexes on global or local temporary tables, sometimes to improve speed, but not necessarily, because the index also consumes a lot of resources. His creation is the same as the actual table.
31, do not build things that do not work, such as generating reports, wasting resources. Use it only when it is necessary to use things.
32. Words with or can be decomposed into multiple queries, and multiple queries are connected through union. Their speed is only related to whether the index is used, and if the query requires a federated index, it is more efficient to execute with UNION all. The words of multiple or are not used in the index, and the form of Union is then tried to match the index. A key question is whether to use the index.
33, minimize the use of views, its low efficiency. The view operation is slower than the direct table operation and can be replaced by stored procedure. In particular, instead of nesting views, nested views add to the difficulty of finding the original data. We look at the nature of the view: It is a well-optimized SQL that has generated query planning on the server. When retrieving data for a single table, do not use a view that points to multiple tables, either directly from the table or only the view that contains the table, otherwise the unnecessary overhead is increased and the query is disturbed. In order to speed up the query of the view, MSSQL adds the function of the view index.
34, do not use distinct and order by when not necessary, these actions can be changed in the client execution. They add extra overhead. This is the same as union and union all. SELECT top ad.companyname,comid,position,ad.referenceid,worklocation, convert (varchar (ten), ad.postdate,120) as Postdate1,workyear,degreedescription from Jobcn_query.dbo.COMPANYAD_query ad where Referenceid in (' JCNAD003 29667 ', ' JCNAD132168 ', ' JCNAD00337748 ', ' JCNAD00338345 ', ' JCNAD00333138 ', ' JCNAD00303570 ', ' JCNAD00303569 ', ' JCNAD00303568 ', ' JCNAD00306698 ', ' JCNAD00231935 ', ' JCNAD00231933 ', ' JCNAD00254567 ', ' JCNAD00254585 ', ' JCNAD00254608 ' , ' JCNAD00254607 ', ' JCNAD00258524 ', ' JCNAD00332133 ', ' JCNAD00268618 ', ' JCNAD00279196 ', ' JCNAD00268613 ') Order by POSTDA Te Desc
35, in the face value of the list, will appear the most frequent values on the front, the least appear in the last face, reduce the number of judgments
36, when using SELECT INTO, it will lock the system table (sysobjects,sysindexes, etc.), blocking the access of other connections. When creating a temporary table, use the Display declaration statement instead of SELECT INTO. drop table T_LXH BEGIN TRAN SELECT * into T_lxh from chineseresume where name = ' XYZ '--comm It in another connection select * from sysobjects can see that select into locks the system table, and Create table locks the system table (whether it is a temporary table or a system table). So don't use it in things!!! In this case, use a real table, or a temporary table variable, if it is a temporary table that you want to use frequently.
37, generally in the group by a have a sentence before you can eliminate the redundant lines, so try not to use them to do the work of the culling line. Their order of execution should be optimal: the WHERE clause of the SELECT selects all the appropriate rows, group by is used to group the statistical rows, and the having words are used to exclude redundant groupings. This way, group by has a small cost, fast query. For large rows of data grouping and having a very consuming resource. If the purpose of group by is not to include calculations, just groups, then use distinct faster
38, one update multiple records score multiple updates each time a fast, that is, batch processing good
39, the use of temporary tables, as far as possible to use the result set and table class variables to replace it, table type of variable than temporary table good
40, under SQL2000, the calculated field can be indexed, the conditions to be met are as follows:
A, the expression of the calculated field is determined
B, cannot be used in the Text,ntext,image data type
C, the following options must be formulated ansi_nulls = on, ansi_paddings = ON, ....
41, try to put the data processing work on the server, reduce the network overhead, such as the use of stored procedures. Stored procedures are compiled, optimized, and organized into an execution plan, and stored in a database of SQL statements, is a collection of control flow language, the speed of course fast. Dynamic SQL, which is executed repeatedly, can use temporary stored procedures that are placed in tempdb (temporary tables). Previously, because SQL Server did not support complex math calculations, it was forced to put this work on top of other tiers and increase the overhead of the network. SQL2000 supports UDFs, which now supports complex mathematical calculations, the return value of functions is not too large, which is expensive. A user-defined function that executes like a cursor consumes a large amount of resources, if a large result is returned with a stored procedure
42. Do not use the same function repeatedly in a sentence, wasting resources, putting the result in a variable and then calling faster
43, SELECT COUNT (*) efficiency teaches low, try to adapt his writing, and exists fast. Also note the difference: Select COUNT (Field of NULL) from Table and select Cou The return value of the From Table for NT (Field of NOT NULL) is different.
44, when the server memory enough, the number of configuration threads = The maximum number of connections +5, so as to maximize the efficiency; otherwise, the thread pool of SQL Server is enabled by using the number of threads < maximum number of connections to resolve, if the number = max connections +5, serious damage to the server's sexual Yes.
45, in a certain order to access your table. If you lock table A and then lock table B, you must lock them in this order in all stored procedures. If you (inadvertently) lock table B in a stored procedure, and then lock Table A, this could result in a deadlock. Deadlocks are hard to find if the lock sequence is not designed in advance
46. Monitor the load Memory:page faults/sec counter of the appropriate hardware through SQL Server Performance Monitor If the value is occasionally higher, it indicates that the thread is competing for memory at that time. If it continues to be high, then memory can be a bottleneck. Process:
1,% DPC time refers to the percentage of the processor used in the deferred program invocation (DPC) to receive and provide services during the sample interval. (DPC is running at a lower interval than the standard interval priority). Because DPC is performed in privileged mode, the percentage of DPC time is part of the percentage of privileged time. These times are calculated separately and are not part of the total number of interval calculations. This total shows the average busy time as a percentage of the instance time.
2,%processor Time counter if the value of this parameter continues to exceed 95%, the bottleneck is the CPU. Consider adding a processor or swapping it for a faster one.
3,% Privileged time refers to the percentage of non-idle processor times used for privileged mode. (Privileged mode is a processing mode designed for operating system components and manipulating hardware drivers.) It allows direct access to hardware and all memory. Another mode is User mode, which is a limited processing mode designed for application, environment sub-system and integer sub-system. The operating system translates the application thread into privileged mode to access the operating system services). The% of privileged time includes the time to service the interruption and DPC. A high privilege time ratio can be caused by a large number of intervals that failed devices produce. This counter displays the average busy time as part of the sample time.
4,% User time represents CPU-intensive database operations, such as sorting, executing aggregate functions, and so on. If the value is high, consider increasing the index, using a simple table join, and horizontally splitting the large table to reduce the value. Physical DISK:CURRETN Disk Queue Length counter this value should not exceed 1.5~2 times the number of disks. To improve performance, you can increase the disk. Sqlserver:cache hit Ratio counter the higher the value, the better. If it lasts below 80%, you should consider increasing the memory. Note that the value of this parameter is incremented after starting SQL Server, so the value will not reflect the current value of the system after a period of time has elapsed.
47. Analysis Select Emp_name form employee where salary > 3000 If the salary is of type float in this statement, the optimizer optimizes it to convert (float,3 000), because 3000 is an integer, we should use 3000.0 in programming instead of waiting for the DBMS to be transformed by the runtime. Conversions of the same character and integer data.
Methods of processing data above millions to improve query speed