[Reprint] Let SQL run faster

[Reprint] Let SQL run faster _ database other

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, unreasonable index design
----Example: The table record has 620000 rows, and the following SQL runs under different indexes:
----1. A non-clustered index was built on date
Select COUNT (*) from the record where date >
' 19991201 ' and date < ' 19991214 ' and amount >
2000 (25 seconds)
Select Date,sum (amount) from record group by date
(55 seconds)
Select COUNT (*) from the record where date >
' 19990901 ' and place in (' BJ ', ' SH ') (27 seconds)
----Analysis:
----date has a large number of duplicate values, which are physically randomly stored on a data page under a non-clustered index,
Range lookup, you must perform a table scan to find all the rows in the range.
----2. A clustered index on date
Select COUNT (*) from the record where date >
' 19991201 ' and date < ' 19991214 ' and amount >
2000 (14 seconds)
Select Date,sum (amount) from record group by date
(28 seconds)
Select COUNT (*) from the record where date >
' 19990901 ' and place in (' BJ ', ' SH ') (14 seconds)
----Analysis:
----under the cluster index, the data is physically sequentially on the data page, and the duplicate values are grouped together, so that in the van
When looking around, you can find the starting point of this range, and only scan the data page in this range, avoiding the big fan
The scanning speed is improved.
----3. Combined index on Place,date,amount
Select COUNT (*) from the record where date >
' 19991201 ' and date < ' 19991214 ' and amount >
2000 (26 seconds)
Select Date,sum (amount) from record group by date
(27 seconds)
Select COUNT (*) from the record where date >
' 19990901 ' and place in (' BJ ', ' SH ') (< 1 seconds)
----Analysis:
----This is an unreasonable combination index because its leading column is place, and the first and second SQL are not cited
Use place, so there is no use of the index; the third SQL is in use, and all the columns referenced are included in the group
In the index, an index overlay is formed, so its speed is very fast.
----4. Combined index on Date,place,amount
Select COUNT (*) from the record where date >
' 19991201 ' and date < ' 19991214 ' and amount >
Watts (< 1 seconds)
Select Date,sum (amount) from record group by date
(11 seconds)
Select COUNT (*) from the record where date >
' 19990901 ' and place in (' BJ ', ' SH ') (< 1 seconds)
----Analysis:
----This is a reasonable combination of indexes. It takes date as the leading column so that each SQL can take advantage of the index and
And the index overlay is formed in the first and third SQL, so the performance is optimal.
----5. Summary:
----The index that is established by default is not a clustered index, but sometimes it is not optimal; a reasonable index design
Based on the analysis and prediction of various queries. Generally speaking:
----①. Have a large number of duplicate values, and often have scope queries
(Between, >,<,>=,< =) and ORDER BY
, GROUP by-by-occurrence column, consider establishing a clustered index;
----②. Multiple columns are frequently accessed at the same time, and each column contains duplicate values to consider establishing a composite index;
----③. Combined indexes as much as possible to make critical queries indexed, the leading columns must be the most frequently used columns. Second, not sufficient conditions of the connection:
----Example: The card has 7896 rows, there is a nonclustered index on the card_no, and the table account has 191122 rows,
There is a nonclustered index on the account_no, which is a preview of two SQL execution under different table joins: select SUM (a.amount) from account A,
Card b where a.card_no = B.card_no (20 seconds)
----Change SQL to:
Select SUM (a.amount) from account A,
Card b where a.card_no = B.card_no and A.
Account_no=b.account_no (< 1 seconds)
----Analysis:
----in the first connection, the best query scheme is to make account as the outer table, card as the inner table, using
The index on the card, whose I/O count can be estimated by the following formula:
----The outer table account 22541 page + (outer table account 191122 Line * Inner table card on the corresponding outer layer
3 pages to find in the first row of the table = 595,907 I/o
----in the second connection condition, the best query scheme is to make the card as the outer table, account as the inner table, using
The index on account, whose I/O count can be estimated by the following formula:
----on the Outer table card 1944 pages + (outer table card 7896 lines * Inner table account on account the corresponding outer table each
4 pages to find for rows = 33,528 I/o
----can be seen, the real best solution will be implemented only if there is a sufficient connection condition.
----Summary:
----1. The query optimizer lists several possible sets of connections, based on the join conditions, before it is actually executed
and find out the best solution for the least cost of the system. The connection condition takes into account the table with the index, the number of rows
table; The selection of the internal and external tables can be determined by the formula: the number of rows in the outer table that matches each lookup in the inner table, multiplied by
Product minimum is the best solution.
----2. View the method of executing a scenario--with Set Showplanon open the SHOWPLAN option, you can see the
The order in which the information is used and what index to use; To see more detailed information, the SA role is required to perform DBCC (3604,310,30
2).
Third, not-optimized where clause
----1. For example: the columns in the following SQL conditional statement have the appropriate indexes, but the execution speed is very slow:
SELECT * FROM record where
SUBSTRING (card_no,1,4) = ' 5378 ' (13 seconds)
SELECT * FROM record where
amount/30< 1000 (11 seconds)
SELECT * FROM record where
Convert (char, date,112) = ' 19991201 ' (10 seconds)
----Analysis:
The result of any operation of the column in the----WHERE clause is computed by column in SQL Runtime, so it has to
Do a table search without using the index above the column; If the results are available when the query is compiled,
Can be optimized by the SQL optimizer, using indexes to avoid table searches, so rewrite the SQL as follows:
SELECT * from record where card_no like
' 5378% ' (< 1 seconds)
SELECT * FROM record where amount
< 1000*30 (< 1 seconds)
SELECT * from record where date= ' 1999/12/01 '
(< 1 seconds)
----you'll find that SQL is obviously fast up!
----2. For example: The table stuff has 200000 rows and the Id_no index is not clustered, see the following sql:
Select COUNT (*) from stuff where id_no in (' 0 ', ' 1 ')
(23 seconds)
----Analysis:
The "in" in the----where condition is logically equivalent to ' or ', so the parser converts in (' 0 ', ' 1 ')
Executes for id_no = ' 0 ' or id_no= ' 1 '. We expect it to be looked up separately according to each or clause, and then the result
Added so that you can take advantage of the index on the Id_no, but actually (according to Showplan), it takes the "or policy"
, the row that satisfies each or clause is first fetched into the worksheet in the temporary database, and the unique index is used to remove
Repeat the row, and finally compute the result from this temporary table. As a result, the actual process does not use the Id_no index and ends
Time is also affected by the performance of the tempdb database.
----practice has shown that the more rows the table has, the worse the performance of the worksheet is, when the stuff has 620000 rows, the execution
Between 220 seconds! You might as well separate an OR clause:
Select COUNT (*) from stuff where id_no= ' 0 '
Select COUNT (*) from stuff where id_no= ' 1 '
----Get two results and add it again. Because each sentence uses an index, the execution time is only 3 seconds,
Under 620000 lines, the time is only 4 seconds. Or, in a better way, write a simple stored procedure:
create proc Count_stuff as
DECLARE @a int
DECLARE @b int
DECLARE @c int
Declare @d char (10)
Begin
Select @a=count (*) from stuff where id_no= ' 0 '
Select @b=count (*) from stuff where id_no= ' 1 '
End
Select @c=@a+@b
Select @d=convert (char (), @c)
Print @d
----directly calculate the result, the execution time is as fast as the above!
----Summary:
----visible, the so-called optimization where clause takes advantage of the index, the table scan or extra overhead is not optimized. ----1. Any action on the column will result in a table scan, which includes database functions, calculation expressions, and so on, when the query
To move the action to the right of the equal sign as much as possible.
----2.in, or clauses often use worksheets to invalidate indexes, and if you don't produce a large number of duplicate values, consider
The clause should contain an index.
----3. To be adept at using stored procedures, it makes SQL more flexible and efficient.
----from the above examples, we can see that the essence of SQL optimization is in the premise of correct results, with the optimizer can
To identify the statement, the full use of the index, reduce the number of I/O table scan, as far as possible to avoid the occurrence of table search. actually s
QL performance optimization is a complex process, these are only in the application level of a embodiment, in-depth research will also
It involves the resource configuration of the database layer, the traffic control of the network layer and the overall design of the operating system layer.
1. Rational use of indexes
Index is an important data structure in database, and its basic aim is to improve the efficiency of query. Most of the database products now use IBM's first proposed ISAM index structure. The use of indexes is just right, with the following principles:
Indexes are established on columns that are frequently connected but not specified as foreign keys, while fields that are not frequently connected are automatically generated by the optimizer.
Index on a column that is frequently sorted or grouped (that is, a group by or order by operation).
A search is established on columns with more values that are often used in conditional expressions, and no index is established on columns with fewer values. For example, there are only two different values for "male" and "female" on the "Sex" column of the employee table, so there is no need to index. If indexing does not improve query efficiency, it can significantly reduce the speed of updates.
If there are multiple columns to be sorted, you can set up a composite index on those columns (compound index).
Use System Tools. If the Informix database has a Tbcheck tool, it can be checked on suspicious indexes. On some database servers, the index may fail or the read efficiency is reduced because of frequent manipulation, and if a query using an index slows down, try using the Tbcheck tool to check the integrity of the index and fix it if necessary. In addition, when a database table updates a large amount of data, deleting and rebuilding the index can increase the query speed. 2. Avoid or simplify sorting
You should simplify or avoid repeating sorting of large tables. The optimizer avoids sorting steps when it is possible to use indexes to automatically produce output in the appropriate order. Here are some of the factors that affect:
The index does not include one or several columns to be sorted;
The order of the columns in the group BY or ORDER BY clause is not the same as the index;
The sorted columns come from different tables.
In order to avoid unnecessary sorting, it is necessary to build the index correctly and consolidate the database table reasonably (although it may sometimes affect the normalization of the table, but it is worthwhile relative to the increase in efficiency). If sorting is unavoidable, try simplifying it, such as narrowing the range of sorted columns. 3. Eliminates sequential access to large table row data
In nested queries, sequential access to tables can have a fatal effect on query efficiency. For example, the use of sequential access strategy, a nested 3-level query, if each layer query 1000 rows, then the query will query 1 billion rows of data. The primary way to avoid this is to index the connected columns. For example, two tables: Student form (school number, name, age ...). ) and the selected timetable (school number, course number, grade). If two tables are to be connected, the index should be indexed on the connection field "School Number".
You can also use a set of collections to avoid sequential access. Although there are indexes on all of the check columns, some forms of the WHERE clause force the optimizer to use sequential access. The following query forces a sequential operation on the Orders table:
Select * FROM Orders Where (customer_num=104 and order_num>1001) or order_num=1008
Although indexes are built on Customer_num and Order_num, the optimizer uses sequential access paths to scan the entire table in the above statement. Because this statement retrieves a collection of detached rows, it should be changed to the following statement:
Select * FROM Orders Where customer_num=104 and order_num>1001
UNION
Select * FROM Orders Where order_num=1008
This allows the query to be processed using the index path. 4. Avoid correlated subqueries
A column's label appears in both the main query and the query in the WHERE clause, it is likely that the subquery must requery once the column value in the main query changes. The more nested the query, the lower the efficiency, so the subquery should be avoided as much as possible. If the subquery is unavoidable, filter out as many rows as possible in the subquery. 5. Regular expressions to avoid difficulties
Matches and like keywords support wildcard matching, technically called regular expressions. But this kind of match is especially time-consuming. For example: Select * from the customer Where zipcode like "98_ _ _"
Even if an index is established on the ZipCode field, the sequential scan is used in this case. If you change the statement to select * from Customer Where zipcode > "98000", the index is used to query when executing the query, which obviously increases the speed significantly.
Also, avoid substrings that do not start. For example, a Select * from Customer Where zipcode[2,3] > "80" takes a non-start substring in the WHERE clause, so the statement does not use an index. 6. Using temporary tables to speed up queries
Sorting a subset of a table and creating a temporary table can sometimes speed up queries. It helps to avoid multiple sorting operations and, in other ways, simplifies the work of the optimizer. For example:
Select cust.name,rcvbles.balance,......other Columns
From Cust,rcvbles
Where cust.customer_id = rcvlbes.customer_id
and rcvblls.balance>0
and cust.postcode> "98000"
ORDER BY Cust.name
If the query is to be executed multiple times and more than once, all unpaid customers can be found in a temporary file and sorted by the customer's name:
Select cust.name,rcvbles.balance,......other Columns
From Cust,rcvbles
Where cust.customer_id = rcvlbes.customer_id
and rcvblls.balance>0
ORDER BY Cust.name
Into TEMP cust_with_balance
Then query in the Temp table in the following way:
Select * from Cust_with_balance
Where postcode> "98000"
There are fewer rows in the temporary table than in the primary table, and the physical order is the required order, reducing disk I/O, so the query workload can be drastically reduced.
Note: Temporary table creation does not reflect changes to the primary table. When data is frequently modified in the primary table, be careful not to lose data. 7. Using sorting to replace non sequential access
Non-sequential disk access is the slowest operation, manifested in the movement of the disk access arm back and forth. The SQL statement hides this situation, making it easy for us to write queries that require access to a large number of non sequential pages when writing an application.
In some cases, using the ability of database sorting to replace non sequential access can improve the query. 3. Optimize tempdb performance
General recommendations for the physical location and database option settings for the tempdb database include:
Enables the tempdb database to be automatically expanded on demand. This ensures that the query is not terminated until execution is completed, and that the intermediate result sets that are generated by the query are much larger than expected in the tempdb database. Set the initial size of the tempdb database file to a reasonable size to avoid automatic expansion of the file when more space is needed. If the tempdb database expands too frequently, performance can be adversely affected. Set the file growth increment percentage to a reasonable size to prevent the tempdb database file from growing at too small a value. If the file growth rate is too small compared to the amount of data written to the tempdb database, the tempdb database may need to be always extended, thereby impairing performance. Place the tempdb database on the fast I/O subsystem to ensure good performance. Stripe the tempdb database on multiple disks for better performance. Place the tempdb database on a disk other than the disk used by the user database. For more information, see Expanding your database.
4. Optimizing Servers: Optimizing server performance with memory configuration options
The Microsoft®sql server™2000 memory management component eliminates the need to manually manage the memory available to SQL Server. SQL Server dynamically determines the amount of memory that should be allocated at startup based on the amount of storage currently being used by the operating system and other applications. When the load on the computer and SQL Server changes, the allocated memory changes as well. For more information, see Memory Architecture. The following server configuration options can be used to configure memory usage and affect server performance:
min server memory
max server memory
Max worker Threads
Index create memory min memory per query
min server memory server configuration option can be used to ensure that SQL Server does not release memory when it reaches this value. This configuration option can be set to a specific value based on the size and activity of the SQL Server. If you choose to set this option, you must leave enough memory for the operating system and other programs. If the operating system does not have enough memory, it will request memory from SQL Server, causing an impact on SQL Server performance. The max server memory server configuration option can be used to specify the maximum amount of memory that SQL Server can allocate when SQL Server starts and runs. You can set this configuration option to a specific value if you know that multiple applications are running at the same time as SQL Server, and you want to ensure that these applications have sufficient memory to run. If these other applications, such as WEB servers or e-mail servers, only request memory on demand, SQL Server frees up memory as needed, so do not set the max server memory server configuration option. However, an application typically uses free memory at startup and does not request if more memory is required. If an application that behaves this way is running on the same computer as SQL Server, the max server memory server configuration option is set to a specific value to ensure that the memory required by the application is not allocated by SQL Server.
Do not set the min server memory and max server memory server configuration options to the same value, which will make the amount of memory allocated to SQL Server fixed. Dynamic memory allocations can provide the best overall performance over time. For more information, see Server memory options. Max worker threads Server configuration options can be used to specify the number of threads that provide support for users to connect to SQL Server. 255 This default setting may be slightly too high for some configurations, depending on the number of concurrent users. Because each worker thread is allocated, even if the thread is not in use (because the concurrent connection is less than the assigned worker thread), memory resources that can be better exploited by other operations, such as cache memory, may also be unused. In general, you should set the configuration value to the number of concurrent connections, but not more than 32727. Concurrent connections are different from user logon connections. The worker thread pool for an instance of SQL Server only needs to be large enough to serve a user connection that is simultaneously executing a batch in the instance. If you increase the number of worker threads by more than the default values, server performance can be reduced. For more information, see max worker threads option.
Explains that the maximum worker-thread server configuration option does not work when SQL Server is running on a Microsoft windows®98. The index create memory server configuration option controls the amount of memory used by the sort operation when the index is created. Creating an index on a production system is usually an infrequently performed task, usually scheduled for a job that is performed at off-peak times. Therefore, an increase in the index creation performance can be increased when the index is not often created and when it is not peak time. However, it is a good idea to keep the min memory per query configuration option at a lower value, so that even if all of the requested memory is not available, the index creation job can still start. For more information, see the index create memory option.
min memory per query server configuration option can be used to specify the minimum amount of memory to be assigned to the query execution. Increasing the value of min memory per query can help improve the performance of queries that consume large amounts of memory, such as large sort and hash operations, when many queries are executed concurrently in the system. However, do not set the min memory per query server configuration option too high, especially on a busy system, because the query will have to wait until the minimum amount of memory is secured for the request, or the value specified in the query Wait server configuration option is exceeded. If the available memory is more than the specified minimum amount of memory required to execute the query, the extra memory can be used as long as the query can efficiently utilize the extra memory. For more information, see min memory per query option and query wait option. Optimizing server performance with I/O configuration options
The following server configuration options can be used to configure the use of I/O and affect server performance: recovery interval
recovery interval Server configuration options control Microsoft®sql server™2000 The time that checkpoints are issued within each database. By default, SQL Server determines the best time to perform checkpoint operations. However, to determine if this is the appropriate setting, you need to use Windows NT Performance Monitor to monitor disk write activity on the database file. Peak activity spikes that cause disk utilization to reach 100% can hinder performance. If you change this parameter to make the checkpoint process less visible, you can generally improve overall performance in this case. However, you still have to continue monitoring performance to determine whether the new values have a positive impact on performance. For more information, see recovery interval option.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Reprint] Let SQL run faster _ database other

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Reprint] Let SQL run faster _ database other

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support