MySQL High concurrency optimization

Last Update:2015-12-20 Source: Internet

Author: User

Tags getdate table definition

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the design of database structure

1, the length of the data line should not exceed 8020 bytes, if more than this length, the data in the physical page will occupy two lines resulting in storage fragmentation, reduce query efficiency.

2. The ability to select numeric types instead of string types (phone numbers) with fields of numeric type, which reduces the performance of queries and connections, and increases storage overhead. This is because the engine handles the query and the connection by comparing each character in the string one at a time, and for a numeric type it only needs to be compared once.

3, for the immutable character type char and the variable character type varchar are 8000 bytes, char query fast, but consumes storage space, varchar query is relatively slow but save storage space. In the design of the field can be flexible choice, such as user name, password, such as the length of the field can be selected char, for the comment, such as the length of a large variety of fields can choose varchar.

4, the length of the field in the maximum possible to meet the needs of the premise, should be set as short as possible, so as to improve the efficiency of the query, but also in the establishment of the index can reduce the consumption of resources.

Second, the optimization of the query

To ensure that the implementation of the function on the basis of minimizing access to the database (can be stored in the cache query results, reduce the number of queries), through the search parameters, minimize the number of access to the table, minimize the result set, thereby reducing the burden on the network, can separate operations as far as possible separate processing, improve each response speed In the Data window when using SQL, try to put the index used in the first column of the selection; the structure of the algorithm is as simple as possible; when querying, do not use wildcards such as the SELECT * from T1 statement, use a few columns to select a few columns such as: Selectcol1,col2 from T1 , as far as possible, limit the number of result sets as possible: SELECT TOP col1,col2,col3 from T1, because in some cases the user does not need that much data.

In the absence of an index, the database looks for a single piece of data, it has to do a full table scan, all data is traversed once, to find the matching records. In the case of a small amount of data, there may not be a noticeable difference, but when the amount of data is large, the situation is extremely bad.

1. Avoid null-valued fields in the WHERE clause, which will cause the engine to discard full-table scans using the index, such as:

Select ID from t where num is null

You can set the default value of 0 on NUM, make sure that the NUM column in the table does not have a null value, and then query:

Select ID from t where num=0

2. Try to avoid using the! = or <> operator in the WHERE clause, or discard the engine for a full table scan using the index. The optimizer will not be able to determine the number of rows to be fatal by the index, so it needs to search all rows of that table.

3. You should try to avoid using or in the WHERE clause to join the condition, otherwise it will cause the engine to abandon using the index for a full table scan, such as:

Select ID from t where num=10 or num=20

You can query this:

Select ID from t where num=10

UNION ALL

Select ID from t where num=20

4.in and not in are also used sparingly, because in makes the system unusable with indexes and can only search the data in the table directly. Such as:

Select ID from t where num in

For consecutive values, you can use between instead of in:

Select ID from t where num between 1 and 3

5. Try to avoid searching in indexed character data using non-heading letters. This also makes the engine unusable with indexes.

See the following example:

SELECT * from T1 WHERE NAME like '%l% '

SELECT * from T1 WHERE substing (name,2,1) = ' L '

SELECT * from T1 WHERE NAME like ' l% '

Even though the name field is indexed, the first two queries are still unable to take advantage of the indexing to speed up the operation, and the engine has to perform the task by one-by-one operations on all tables. The third query can use an index to speed up operations.

6. Forcing the query optimizer to use an index if necessary, such as using parameters in the WHERE clause, can also cause a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the selection of access plans to run time; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and therefore cannot be selected as an input for the index. The following statement will perform a full table scan:

Select ID from t where [email protected]

You can force the query to use the index instead:

Select ID from T with (index name) where [email protected]

7. You should try to avoid expression operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:

SELECT * from T1 WHERE f1/2=100

should read:

SELECT * from T1 WHERE f1=100*2

SELECT * from RECORD WHERE SUBSTRING (card_no,1,4) = ' 5378 '

should read:

SELECT * from RECORD WHERE card_no like ' 5,378% '

SELECT Member_number, first_name, last_name from members

WHERE DATEDIFF (Yy,datofbirth,getdate ()) > 21

should read:

SELECT Member_number, first_name, last_name from members

WHERE dateOfBirth < DATEADD (Yy,-21,getdate ())

That is, any action on a column causes a table scan, which includes database functions, calculation expressions, and so on, to move the operation to the right of the equals sign whenever possible.

8. You should try to avoid function operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:

Select ID from t where substring (name,1,3) = ' abc '--name ID starting with ABC

Select ID from t where DATEDIFF (day,createdate, ' 2005-11-30 ') =0--' 2005-11-30 ' generated ID

should read:

Select ID from t where name like ' abc% '

Select ID from t where createdate>= ' 2005-11-30 ' and createdate< ' 2005-12-1 '

9. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the WHERE clause, or the index may not be used correctly by the system.

10. When using an indexed field as a condition, if the index is a composite index, you must use the first field in the index as a condition to guarantee that the system uses the index, otherwise the index will not be used, and the field order should be consistent with the index order as much as possible.

11. Many times it is a good choice to use exists:

Elect num from a where num in (select num from B)

Replace with the following statement:

Select num from a where exists (select 1 from b where num=a.num)

SELECT SUM (T1. C1) from T1 WHERE (

(SELECT COUNT (*) from T2 WHERE t2.c2=t1.c2>0)

SELECT SUM (T1. C1) from T1where EXISTS (

SELECT * from T2 WHERE T2. C2=t1. C2)

Both produce the same result, but the latter is obviously more efficient than the former. Because the latter does not produce a large number of locked table scans or index scans.

If you want to verify that there is a record in the table, do not use COUNT (*) as inefficient and waste server resources. Can be replaced with exists. Such as:

IF (SELECT COUNT (*) from table_name WHERE column_name = ' xxx ')

Can be written as:

IF EXISTS (SELECT * FROM table_name WHERE column_name = ' xxx ')

It is often necessary to write a t_sql statement that compares a parent result set and a child result set to find out if there are records in the parent result set that are not in the child result set, such as:

SELECT A.hdr_key from Hdr_tbl a----tbl a means that TBL uses alias a instead

Where not EXISTS (SELECT * from dtl_tbl b WHERE a.hdr_key = B.hdr_key)

SELECT A.hdr_key from Hdr_tbl a

Left JOIN dtl_tbl B-a.hdr_key = B.hdr_key WHERE B.hdr_key is NULL

SELECT Hdr_key from Hdr_tbl

WHERE Hdr_key not in (SELECT Hdr_key from DTL_TBL)

Three kinds of writing can get the same correct results, but the efficiency is reduced in turn.

12. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, be aware that the index is very limited (only the primary key index).

13. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

14. Temporary tables are not unusable, and they can be used appropriately to make certain routines more efficient, for example, when you need to repeatedly reference a dataset in a large table or a common table. However, for one-time events, it is best to use an export table.

15. When creating a temporary table, if you insert a large amount of data at one time, you can use SELECT INTO instead of CREATE table to avoid causing a large number of logs to increase speed, and if the amount of data is small, create table to mitigate the resources of the system tables. Then insert.

16. If a temporary table is used, be sure to explicitly delete all temporary tables at the end of the stored procedure, TRUNCATE table first, and then drop table, which avoids longer locking of the system tables.

17. Set NOCOUNT on at the beginning of all stored procedures and triggers, set NOCOUNT OFF at the end. You do not need to send a DONE_IN_PROC message to the client after each statement that executes the stored procedure and trigger.

18. Try to avoid large transaction operation and improve the system concurrency ability.

19. Try to avoid the return of large data to the client, if the amount of data is too large, should consider whether the corresponding demand is reasonable.

20. Avoid using incompatible data types. For example, float and int, char and varchar, binary, and varbinary are incompatible (when the condition is judged). Incompatible data types may make the optimizer unable to perform some optimizations that could otherwise have been performed. For example:

SELECT name from employee WHERE salary > 60000

In this statement, such as the salary field is a money type, it is difficult for the optimizer to optimize it because 60000 is an integer number. We should convert an integer into a coin type when programming, rather than wait for a run-time conversion.

21. Make full use of the connection condition (the more the condition the faster), in some cases, there may be more than one connection condition between the two tables, at which point the connection condition is fully written in the WHERE clause, which can greatly improve the query speed.

Cases:

SELECT SUM (A.amount) from account a,card B WHERE a.card_no = b.card_no

SELECT SUM (A.amount) from account a,card B WHERE a.card_no = B.card_no and A.account_no=b.account_no

The second sentence will be much faster than the first sentence.

22. Use the view to speed up the query

Sorting a subset of tables and creating views can sometimes speed up queries. It helps to avoid multiple sorting operations, and in other ways simplifies the work of the optimizer. For example:

SELECT cust.name,rcvbles.balance,......other Columns

From Cust,rcvbles

WHERE cust.customer_id = rcvlbes.customer_id

and rcvblls.balance>0

and cust.postcode> "98000"

ORDER by Cust.name

If the query is to be executed more than once, all unpaid customers can be found in a single view and sorted by the customer's name:

CREATE VIEW DBO. V_cust_rcvlbes

SELECT cust.name,rcvbles.balance,......other Columns

From Cust,rcvbles

WHERE cust.customer_id = rcvlbes.customer_id

and rcvblls.balance>0

ORDER by Cust.name

Then query in the view in the following way:

SELECT * from V_cust_rcvlbes

WHERE postcode> "98000"

The number of rows in the view is less than the rows in the primary table, and the physical order is the required order, reducing disk I/O, so the query effort can be significantly reduced.

23, can use distinct without group by (group by operation is particularly slow)

SELECT OrderID from Details WHERE UnitPrice > Ten GROUP by OrderID

Can be changed to:

SELECT DISTINCT OrderID from Details WHERE UnitPrice > 10

24. Use UNION ALL to not use Union

UNION all does not execute the SELECT DISTINCT function, which reduces a lot of unnecessary resources

35. Try not to use the SELECT INTO statement.

The SELECT inot statement causes the table to lock and prevent other users from accessing the table.

What we mentioned above is some basic considerations for improving query speed, but in more cases, it is often necessary to experiment with different statements to get the best solution. The best way of course is to test, see the implementation of the same function of the SQL statement which execution time is the least, but the database if the amount of data is not comparable, then you can use to view the execution plan, that is: the implementation of the same function of multiple SQL statements to the Query Analyzer, according to Ctrl+l to look at the index used, The number of table scans (both of which have the greatest impact on performance) and the overall cost percentage to be consulted.

Third, the optimization of the algorithm

Avoid using cursors as much as possible, because cursors are inefficient and should be considered if the cursor is manipulating more than 10,000 rows of data. Before using a cursor-based method or temporal table method, you should first look for a set-based solution to solve the problem, and the set-based approach is generally more efficient. As with temporary tables, cursors are not unusable. Using Fast_forward cursors on small datasets is often preferable to other progressive processing methods, especially if you must reference several tables to obtain the required data. Routines that include "totals" in the result set are typically faster than using cursors. If development time permits, a cursor-based approach and a set-based approach can all be tried to see which method works better.

Cursors provide a means of progressive scanning in a particular set, typically using cursors to traverse data on a row-by-line basis, with different operations depending on the data being fetched. Especially for multi-table and large table-defined cursors (large data sets) loops can easily make the program into a long wait or even panic.

In some cases, it is sometimes necessary to use cursors, which can also be considered when the qualifying rows of data into the temporary table, and then the temporary table definition of the cursor operation, the performance can be significantly improved.

(Example: Internal statistics first edition)

Encapsulating stored Procedures

Iv. Building an efficient index

Creating an index typically has the following two purposes: maintain the uniqueness of the indexed columns and provide a strategy for quickly accessing the data in the table. Large databases have two indexes, clustered index and non-clustered index, a table without a clustered index is a heap structure to store data, all data are added to the end of the table, and a clustered index of the table, whose data is physically stored in the order of the cluster index key, a table only allows a cluster index, so, according to the B-tree structure, You can understand that adding any index can increase the speed of query-by-index columns, but it reduces the performance of INSERT, update, and delete operations, especially when the fill factor (Factor) is large. As a result, frequent inserts, updates, deletions, and tables and indexes for more indexed tables allow for a smaller fill factor to leave more free space in each data page, reducing page splits and re-organizing work.

An index is one of the most efficient ways to get data from a database. 95% of database performance problems can be solved by indexing technology. As a rule, I usually use a unique group index on the logical primary key, a unique non-group index on the System key (as a stored procedure), and a non-group index on any foreign key column [field]. However, the index is like salt, too much food is salty. You have to consider how large the database is, how the table is accessed, and whether the access is primarily used for reading and writing.

In fact, you can interpret an index as a special kind of directory. Microsoft's SQL Server provides two types of indexes: Clustered indexes (clustered index, also called clustered indexes, clustered indexes), and nonclustered indexes (nonclustered index, also called nonclustered indexes, non-clustered indexes). Let's take a look at the differences between clustered and nonclustered indexes, for example:
In fact, the body of our Chinese dictionary is itself a clustered index. For example, we have to check the word "Ann", it will be very natural to open the first few pages of the dictionary, because "ann" Pinyin is "an", and alphabetical order of Chinese characters in the dictionary is the English letter "a" beginning and "Z", then the word "Ann" naturally ranked in the front of the dictionary. If you have turned over all the parts that begin with "a" and still cannot find the word, then it means that you do not have the word in your dictionary, and if you look up the word "Zhang", you will also turn your dictionary into the last part, because the pinyin of "Zhang" is "Zhang". That is, the body part of the dictionary is itself a directory, and you do not need to look up other directories to find what you need to find.
We refer to this body of content itself as a directory of certain rules, called a "clustered index."

If you know a word, you can quickly check it out automatically. But you may also encounter the words you do not know, do not understand its pronunciation, at this time, you can not follow the method to find the word you want to check, and need to go to the "radicals" to find the word you are looking for, and then according to the page number after the word directly to a page to find the word you are looking for. But the sort of words you find in combination with the "radicals" and "gept" is not really the sort method of the body, for example, you check the word "Zhang", we can see in the Gept table after the Radicals "Zhang" page number is 672 pages, gept table "Zhang" above is "Chi" word, but the page number is 63 pages, "Zhang" below is "crossbow "Word, page is 390 pages. Obviously, these words are not really in the "Zhang" the word of the upper and lower side, now you see the continuous "Chi, Zhang, crossbow" three words is actually their order in the nonclustered index, is the dictionary body of words in the non-clustered index mapping. We can find the words you need in this way, but it takes two procedures to find the results in the catalog and then turn to the page numbers you need.
We put this kind of directory purely as a directory, the body is purely the sort of body is called "nonclustered index".

Further, we can easily understand that each table can have only one clustered index, because the catalog can only be sorted in one way.

(i) When to use clustered or nonclustered indexes

The following table summarizes when to use clustered or nonclustered indexes (it is important).

Action description using a clustered index with a nonclustered index

Columns are often sorted by grouping should be

Returning data in a range should not be

One or very few different values should not be

A small number of different values should not be

A large number of different values should not be

Columns that are frequently updated should not be

Foreign key columns should be

The primary key column should be

Frequently modifying index columns should not be

In fact, we can understand the above table through examples of the previous clustered index and the definition of a nonclustered index. For example, to return data in a range. For example, if you have a table with a time column and you have the aggregate index in that column, you will be very fast when you query the entire data from January 1, 2004 to October 1, 2004, because the body of your dictionary is sorted by date, A clustered index only needs to find the beginning and end data in all the data to be retrieved, rather than a nonclustered index, you must first look up the page number for each item in the table of contents, and then find the specific content based on the page number.

(b) Practical, the misunderstanding of the use of the index

The purpose of the theory is to apply. Although we have just listed when clustered or nonclustered indexes should be used, in practice the above rules are easily overlooked or cannot be analyzed in the light of the actual situation. Below we will be based on the actual problems encountered in the practice of the index used in the misunderstanding, so that you can master the method of index establishment.

1, the primary key is the clustered index

The idea, I think, is an extreme mistake, a waste of a clustered index. Although SQL Server defaults to establishing a clustered index on the primary key.

In general, we will create an ID column in each table to differentiate each piece of data, and this ID column is automatically incremented, and the stride size is typically 1. This is true of the column GID in our example of office automation. At this point, if we set this column as the primary key, SQL Server will think of this Lieme as a clustered index. The benefit is that your data can be physically sorted in the database by ID, but I don't think it makes much sense.

Obviously, the advantage of a clustered index is obvious, and there can be only one rule for a clustered index in each table, which makes the clustered index more valuable.

From the definition of the clustered index we've talked about, we can see that the biggest benefit of using a clustered index is the ability to quickly narrow the query based on query requirements and avoid full table scans. In practice, because the ID number is automatically generated, we do not know the ID number of each record, so it is difficult to use the ID number to query. This makes the ID number the primary key as a clustered index a waste of resources. Second, a field that has a different ID number as a clustered index does not conform to the "Aggregate index should not be established" rule for a "large number of different values"; Of course, this situation is only for the user to modify the record content, especially when the index entry is negative, but for the query speed does not affect.

In the office automation system, whether it is the System home page display needs the user to sign the document, the meeting or the user carries on the file query and so on any circumstance to carry on the data inquiry to be inseparable from the field is "the date" and the user's own "user name".

Typically, the home page of office automation displays files or meetings that each user has not yet signed up for. Although our where statement can only limit the current user has not yet signed the case, but if your system has been established for a long time, and the amount of data is large, then every time each user opens the first page of a full table scan, it is not meaningful to do so, The vast majority of users have browsed through the files 1 months ago, which can only increase the cost of the database. In fact, we can allow users to open the system first page, the database only query the user for nearly 3 months not to read the file, through the "date" this field to limit the table scan, improve query speed. If your office automation system has been established for 2 years, then your homepage display speed will theoretically be 8 times times faster than the original speed.

2, as long as the index can significantly improve the query speed

In fact, we can see that in the example above, the 2nd and 3 statements are identical, and the indexed fields are the same; only the non-aggregated indexes that were established on the Fariqi field, the latter set up in the aggregate index on this field, but the query speed is vastly different. Therefore, not simply indexing on any field can improve query speed.

From the statement in the table, we can see that there are 5,003 different records for the Fariqi field in the table with 10 million data. It is more appropriate to establish an aggregate index on this field. In reality, we send a few documents every day, these documents are issued in the same date, which is fully in line with the requirements of the establishment of a clustered index: "Neither the vast majority of the same, but not only a very few of the same" rule. As a result, it is important for us to build an "appropriate" aggregate index to improve query speed.

3. Add all fields that need to increase query speed to the clustered index to improve query speed

As already mentioned above: in the data query can not be separated from the field is the "date" and the user's own "user name." Since both of these fields are so important, we can merge them together to create a composite index (compound index).

Many people think that as long as you add any field to the clustered index, you can improve the query speed, and some people are puzzled: if the composite clustered index field is queried separately, then the query speed will slow? With this problem, let's take a look at the following query speed (the result set is 250,000 data): (the date column Fariqi first in the composite clustered index starting column, the user name Neibuyonghu row in the back column)

We can see that if you use only the starting column of the clustered index as the query condition and the query speed for all columns that are used concurrently with the composite clustered index, it is even faster than using all of the composite index columns (in the same number of query result sets), and if only the non-starting columns of the composite clustered index are used as the query criteria, This index is not useful at all. Of course, the query speed of statements 1, 2 is the same as the number of entries queried, if all the columns of the composite index are used, and the query results are small, so that will form an "index overlay", thus the performance can be achieved optimally. Also, keep in mind that no matter if you use other columns of the aggregated index frequently, the leading columns must be the most frequently used columns.

(iii) Other considerations

"The water can carry the boat, also overturn it", the index is the same. Indexes can help improve retrieval performance, but too many or improper indexes can cause system inefficiencies. Because the user adds an index to the table, the database will do more work. Too many indexes can even cause index fragmentation.

So, we want to build an "appropriate" index system, especially for the creation of aggregated indexes, should be better, so that your database can be high performance

MySQL High concurrency optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More