In some cases, MySQL can directly use indexes to meetORDER BY
OrGROUP BY
Clause without additional sorting.
AlthoughORDER BY
The index is not exactly matched with the order of the index, but can still be used, as long as the unused index part and all the additionalORDER BY
Field inWhere
Clause is included. The following queries are all indexed.ORDER BY
OrGROUP BY
Part:
Select * FROM t1 orDER BY key_part1,key_part2,... ;Select * FROM t1 Where key_part1=constant orDER BY key_part2;Select * FROM t1 Where key_part1=constant GROUP BY key_part2;Select * FROM t1 orDER BY key_part1 DESC, key_part2 DESC;Select * FROM t1Where key_part1=1 orDER BY key_part1 DESC, key_part2 DESC;
In other cases, MySQL cannot use indexes to meet the requirements.ORDER BY
Although it uses indexes to find records for matchingWhere
Clause. These situations are as follows:
- Perform different index keys
ORDER BY
:
Select * FROM t1 orDER BY key1, key2;
- On the non-consecutive index key
ORDER BY
:
Select * FROM t1 Where key2=constant orDER BY key_part2;
- Used at the same time
ASC
AndDESC
:
Select * FROM t1 orDER BY key_part1 DESC, key_part2 ASC;
- Index key used for searching records
ORDER BY
Is not the same:
Select * FROM t1 Where key2=constant orDER BY key1;
- Many tables are connected together, And the read records are
ORDER BY
All fields in are from the first very number table (that is, inEXPLAIN
The connection type of the first table in the analysis result is notconst
).
- Different
ORDER BY
AndGROUP BY
Expression.
- Records in Table indexes are not stored in order. For example,
HASH
AndHEAP
The table is like this.
RunEXPLAIN Select ... orDER BY
To see if MySQL uses an index in the query. IfExtra
The field value isUsing filesort
MySQL cannot use the index. For details, see "7.2.1EXPLAIN
Syntax (get information aboutSelect
)".
When the results must be sorted, MySQL 4.1 used the followingfilesort
Algorithm:
- Reads records based on the index key or scans data tables. Those that cannot match
Where
All clause records are skipped.
- Each record in the buffer uses a pair to store two values (index key and record pointer ). The buffer size depends on system variables.
sort_buffer_size
Depends on the value.
- When the buffer is slow, run qsort and store the results in temporary files. Save the stored block pointer (if all the 'to' values can be saved in the buffer, you do not need to create a temporary file ).
- Execute the above operation until all the records are read.
- Perform multiple merge operations to merge up
MERGEBUFF
(7) blocks in one region are stored in another temporary file. Repeat this operation until all the blocks in the first file are placed in the second file.
- Repeat the preceding operations until the remaining number of blocks is smaller
MERGEBUFF2
(15 ).
- In the last multi-merge, only the record pointer (the last part of the sort index key) is written to the result file.
- Read records in sequence by reading the records pointer in the result file. To optimize this operation, MySQL puts the record pointer read into a large block and uses it to read records in sequence and store the records in the buffer. The buffer size is determined by system variables.
read_rnd_buffer_size
Depends on the value. The code for this step is in the source file'SQL/records. CC'.
One problem with this approximation algorithm is that the database reads two records: one is estimation.Where
The second clause is the sorting time. Although the record is successfully read for the first time (for example, a full table scan), the second is random reading (the index key has been sorted, but the record does not exist ).
In MySQL 4.1 and later versions,filesort
The optimization algorithm is used to record not only the index key value and record location, but also the fields required in the query. This avoids the need to read records twice. Improvedfilesort
The algorithm is roughly as follows:
- As before, read matching
Where
Clause records.
- Each record records a corresponding; 'tuples 'information, including the index key value, record location, and all fields required in the query.
- Sort the 'tuples 'based on the index key.
- Read records in sequence, but read records from the sorted 'tuples 'list instead of from the data table.
Use the improvedfilesort
Compared with the original algorithm, the 'tuples 'take longer space than the 'right'. They are rarely suitable for sorting buffering (the buffer size is determinedsort_buffer_size
). Therefore, more I/O operations may be required, resulting in slower improved algorithms. To avoid slowing down, this optimization method is only used to sort the total size of additional fields in the 'tuples 'beyond the system variables.max_length_for_sort_data
(The value of this variable is set too high, which means High Disk Load and low CPU load ).
To improveORDER BY
The speed depends on whether MySQL can use indexes instead of additional sorting processes. If you cannot use an index, follow these policies:
- Add
sort_buffer_size
.
- Add
read_rnd_buffer_size
.
- Modify
tmpdir
To point to a dedicated File System with a lot of free space. If you use MySQL 4.1 or update, this option allows multiple paths to use the loop format. Each path is separated by a colon (':') on UNIX, and a semicolon (';') on Windows, Netware, and OS/2 (';'). This feature can be used to evenly distribute loads to several directories. Note: These paths must be directories distributed on different physical disks, rather than different directories on the same physical disk.
By default, MySQLGROUP BY col1, col2, ...
Sort the query andORDER BY col1, col2, ...
The query is the same. If it explicitly contains a list with the same fieldORDER BY
MySQL will not lose the speed when optimizing it, Because sorting will always happen. If a query containsGROUP BY
To avoid overhead of sorting results, you can useORDER BY NULL
To cancel sorting. For example:
Insert INTO fooSelect a, COUNT(*) FROM bar GROUP BY a orDER BY NULL;
7.2.10 how to optimize MySQL
LIMIT
In some casesLIMIT row_count
But not usedHAVING
The query will be processed differently:
- If you only use
LIMIT
MySQL sometimes uses indexes, but more often it is a full table scan.
- If
LIMIT row_count
AndORDER BY
MySQL isRow_countThe sorting will stop, rather than sorting the entire table.
- When
LIMIT row_count
AndDISTINCT
MySQL findsRow_countA unique record is no longer searched.
- In some cases,
GROUP BY
You can read the index key in sequence (or sort the index key) and calculate the accumulated information until the index key changes. In this case,LIMIT row_count
It does not calculate any non-essentialGROUP BY
Value.
- Once MySQL sends a full number of request records to the client, it suspends the query unless
SQL_CALC_FOUND_ROWS
.
LIMIT 0
Always Returns an empty result set. This is useful for checking the type of the query or result field.
- When the server uses a temporary table to process queries
LIMIT row_count
It can be used to calculate the amount of space required.
7.2.11 how to avoid full table Scan
If MySQL needs to perform a full table scan to process the queryEXPLAIN
In the resulttype
The field value isALL
. MySQL performs a full table scan under the following conditions:
- The data table is too small. A full table scan is much faster than a search for the index key. This is usually done when the total number of records in a table is less than 10 and the record length is relatively short.
- No applicable
ON
OrWhere
The index field of the clause.
- By comparing index fields with constant values, MySQL has computed (based on the Index Tree) to a constant that covers a large part of the data table. Therefore, full table scanning should be faster. For details, see "7.2.4 how MySQL Optimizes
Where
".
- Other fields use an index key with a small base (many records match the index key value. In this case, MySQL considers that using index keys requires a large number of searches, which is not as fast as full table scan.
For small tables, full table scan is usually more appropriate. However, for large tables, try the following techniques to avoid the full table scan incorrectly selected by the optimizer:
7.2.12 AccelerationInsert
The time taken to insert a record is determined by the following factors, and the following figures roughly indicate the proportion of the impact:
- Connection: (3)
- Send query to server: (2)
- Resolution query: (2)
- Insert record: (1 X record size)
- Insert index: (1 X number of indexes)
- Close: (1)
The overhead of opening the data table during initialization is not considered here, because each query is performed only once.
If it is a B-tree index, as the number of indexes increases, the speed of inserting records decreases in the proportion of log n.
You can use the following methods to increase the insert speed:
- You can use
Insert
Multiple statements are attached.VALUES
Value. This approach is better than using a single valueInsert
The statement is much faster (in some cases, it is faster ). If you want to add a record to a non-empty data table, you can adjust the variablebulk_insert_buffer_size
To make it faster. For more information, see "5.2.3 server system variables ".
- If you want to insert a large number of records from different clients, use
Insert DELAYED
Statement can also increase the speed. For details, see "14.1.4Insert
Syntax ".
- Pair
MyISAM
You canSelect
Insert a record when the statement is running, as long as the record is not deleted at this time.
- To load a text file to a data table, use
LOAD DATA INFILE
. This is usually a large numberInsert
20 times of the statement. For details, see "14.1.5LOAD DATA INFILE
Syntax ".
- Through some additional work, it is possible
LOAD DATA INFILE
It runs faster when a data table has a large number of indexes. The procedure is as follows:
- Use
Create TABLE
Create a table.
- Run
FLUSH TABLES
Statement ormysqladmin flush-tables
Command.
- Run
myisamchk --keys-used=0 -rq /path/to/db/tbl_name
Command to delete all indexes of a data table.
- Run
LOAD DATA INFILE
, Data is inserted into the table, because the table index does not need to be updated, so this will be very fast.
- Run
myisampack
Make the data table smaller. For more information, see "15.1.3.3 compressed table characteristics ".
- Run
myisamchk -r -q /path/to/db/tbl_name
Re-create the index. The created index tree is saved in memory before being written to the disk, which saves the disk search speed. After reconstruction, the index tree distribution is very balanced.
- Run
FLUSH TABLES
Statement ormysqladmin flush-tables
Command.
Note,LOAD DATA INFILE
When you insert data into an empty table, it is also optimized. The main difference is that: Runmyisamchk
More temporary memory will be allocated for index creation and executionLOAD DATA INFILE
The command allows the database server to allocate memory for re-indexing. Starting from MySQL 4.0, you can runAlter TABLE tbl_name DISABLE KEYS
To replacemyisamchk --keys-used=0 -rq /path/to/db/tbl_name
, RunAlter TABLE tbl_name ENABLE KEYS
Replacemyisamchk -r -q /path/to/db/tbl_name
. This can be omitted.FLUSH TABLES
Step.
- Several statements can be executed together after the table is locked to accelerate
Insert
Operation:
LOCK TABLES a WRITE; Insert INTO a VALUES (1,23),(2,34),(4,33); Insert INTO a VALUES (8,26),(6,29); UNLOCK TABLES;
This improves performance by: until allInsert
After the statement is complete, the index cache is refreshed to the disk at a time. Usually, there are several timesInsert
Statement will incur the overhead of refreshing the index cache to the disk most times. If multiple values can be inserted in one statement at a time, the displayed lock table operation is unnecessary. For the transaction table, useBEGIN/COMMIT
ReplaceLOCK TABLES
To speed up. The lock table also reduces the total time for multiple connection tests, even though the maximum waiting time for each independent connection to wait for the lock increases. For example:
Connection 1 does 1000 inserts Connections 2, 3, and 4 do 1 insert Connection 5 does 1000 inserts
If no lock table exists, connection 2, 3, 4 will be completed before 1, 5. If the table is locked, the join 2, 3, 4 may be completed after 1, 5, but the total time may only need 40%. MySQLInsert
,Update
,Delete
Operations are very fast, but if there are more than five inserts or updates in a statement, it is best to lock them for better performance. If you want to insert multiple inserts at a time, it is best to add the values before and after each loop (about 1000 times ).LOCK TABLES
AndUNLOCK TABLES
So that other processes can access the data table. The performance is still good.Insert
Always RatioLOAD DATA INFILE
Data insertion is slow because the implementation policies of the two are different.
- To
MyISAM
The table is faster.LOAD DATA
INFILE
AndInsert
System variables can be added.key_buffer_size
For more information, see "7.5.2 tuning server Parameters ".
7.2.13 AccelerationUpdate
Update
Statement optimization andSelect
But it has additional write overhead. The write overhead depends on the number of records to be updated and the number of indexes. If the index does not change, no update is required.
Another way to increase the update speed is to delay the update and put many updates together. If the table is locked, it is much faster to perform multiple updates at the same time than to update the tables separately.
Note:MyISAM
If the dynamic record format is used in the table, the record may be split after it is updated to a longer value. If you do this frequently, do it once in a whileOPTIMIZE TABLE
It is very important. For details, see "14.5.2.5 optimize table Syntax ".
7.2.14 Acceleration
Delete
The time for deleting a single record is almost proportional to the number of indexes. To delete records more quickly, you can increase the index key cache. For details, see "7.5.2 tuning server Parameters ".
To delete all records of a data table, useTRUNCATE TABLE tbl_name
InsteadDelete FROM tbl_name
. For details, see "14.1.9 truncate Syntax ".
7.2.15 other optimization ideas
This section lists some other ideas for improving the query processing speed:
- Use permanent connection to the database to avoid connection overhead. If you need to initialize many connections and cannot use permanent connections, you can modify the variables.
thread_cache_size
For more information, see "7.5.2 tuning server Parameters ".
- Always check whether the query uses the existing indexes in the table. In MySQL, you can use
EXPLAIN
Statement. For details, see "7.2.1 explain syntax (get information about a select )".
- Do not update frequently.
MyISAM
Tables are too complexSelect
Statement to avoid contention for locks between reading and writing.
- In
MyISAM
If no records are being deleted, you can insert records while reading data in other queries. If this is very important, you should try to use the table when the table does not delete records. Another possible solution is to delete a large number of records and then executeOPTIMIZE TABLE
Statement.
- If you always need to follow
expr1, expr2, ...
UseAlter TABLE ... orDER BY expr1, expr2, ...
Modify a table. After you use this method to expand and modify tables, you may obtain higher performance.
- In some cases, the field type is''Hashed''Based on other fields. If this field is short and basically a unique value, it may be faster than using a large index on several fields. This additional field can be used very easily, as follows:
Select * FROM tbl_name Where hash_col=MD5(CONCAT(col1,col2)) AND col1='constant' AND col2='constant';
- If
MyISAM
Tables are frequently modified in large quantities, so avoid modifying all variable-length fields as much as possible (VARCHAR, BLOB,TEXT
). Although the table has only one variable-length field, it uses the dynamic record format. For details, see "15 MySQL storage engines and table types ".
- Generally, when the data table record changes''Big''Then, it is not very useful to split the table into several different tables. The maximum performance of accessing a record is that the first byte of the record is found during disk search. After finding the record location, most of the current disks can quickly read records for most applications. Set
MyISAM
When a table is split into multiple unique columns, the fields in the dynamic format in the data table can be modified to a fixed size record, or the table needs to be scanned frequently, however, you do not need to read most of the fields. For details, see "15 MySQL storage engines and table types ".
- If you need to frequently perform statistics on a table based on many fields, it is better to create a new table to store the statistics of these real-time updates. Updates like the following will be very fast:
Update tbl_name SET count_col=count_col+1 Where key_col=constant;
If you only need table-level locks (multiple reads/one write), useMyISAM
The storage engine is very important, suchMyISAM
AndISAM
Table. This also boasts good performance in many databases, because row-Level Lock management programs are basically useless in this case.
- To collect statistics from a large log table, you can use the abstract table instead of scanning the entire log table. Maintain abstract table comparison''Real-time''. When things change (such as business decision-making), re-creating the abstract table is much faster than modifying the applications in the operation.
- If possible, it is best to use a classification report.''Real-time''Or''Statistic''The data required for the report only needs to come from the summary table, and the information of the summary table is generated periodically from the real-time data.
- It should be recognized that the field has a default value. You must specify the values when they are different from the default values. This eliminates the need for MySQL to increase the insert speed.
- In some cases, data is assembled and stored in
BLOB
Type fields are more convenient. Add additional commands to assemble and disassemble the application.BLOB
Value in the field, but this can save a lot of storage overhead in some cases. This does not require data compliance.Record-and-field
Format table structure is very practical.
- Generally, all redundant data (in the database principle, it is called the "third paradigm") should be stored "). However, it is cost-effective to copy some information or create a summary table for higher efficiency.
- Stored Procedure or
UDFs
(User-Defined Functions) may have higher performance when executing some tasks. However, when the database does not support these features, there are other alternative methods to achieve the goal, even if they are a little slow.
- You can obtain results from the query cache or response, and then put many insert and update operations together. If the database supports table locks (such as MySQL and Oracle), this ensures that the index cache only needs to be refreshed once after all update operations.
- You can use
Insert DELAYED
. This will increase the speed because multiple records are simultaneously written to the disk.
- When you want
Select
The statement has a higher priority than the insert operation.Insert LOW_PRIORITY
.
- Use
Select HIGH_PRIORITY
To skip the queue, that is, even if other clients are writing dataSelect
The execution is complete.
- In
Insert
Multiple record insertion formats are used in statements (supported by many databases ).
- Use
LOAD DATA INFILE
To import a large amount of data.Insert
Fast.
- Use
AUTO_INCREMENT
Field to generate a unique value.
- Regular execution
OPTIMIZE TABLE
Prevents dynamic record formatMyISAM
Table fragments. For details, see "15.1.3MyISAM
Table storage formats ".
- Use
HEAP
Table, which may increase the speed. For details, see "15.1.3MyISAM
Table storage formats ".
- In normal Web server configuration, it is best to store image files as files, and only store the file index information in the database. The reason for this is that, in general, the web server always performs better on the File Cache than the database, so using file storage will make the system easier and faster.
- For infrequently accessed data that is not very important, it can be stored in a memory table. For example, when the Web Client cannot save cookies, it is used to save the title of the last display.
- Fields with the same values in different tables should be declared as of the same type. Before MySQL 3.23, table join operations are slow. Make the field name as simple as possible. For example
customer
Inname
To replacecustomer_name
As the field name. To make the field name portable in other database systems, it should be within 18 characters.
- If you need real high speed, we recommend that you look at the differences between the underlying data storage interfaces supported by various database servers. For example
MyISAM
The storage engine is 2-5 times faster than other SQL interfaces. This requires that the data must be on the same server as the application, and it is usually accessed by only one process (because the external file lock is indeed slow ). Using only one process can eliminate the introduction of the underlyingMyISAM
The problem caused by the Command (this is easy to get higher performance, if needed ). Because of the careful design of database interfaces, it is easy to support this optimization method.
- If you use digital data, in many cases you want to access the database (using online connections) faster than using text files. Because the digital information is more compact than the text files stored in the database, only less disk searches are required for access. In addition, you can save code in the application because you do not need to parse text files to find the corresponding rows and fields.
- Database Replication has performance benefits for some operations. The client can obtain data from multiple replication servers, which can allocate the load. To avoid slowing down the master server when backing up data, you can also store the backup on the slave server. For details, see "6 replication in MySQL ".
- Definition
MyISAM
Option added when tableDELAY_KEY_WRITE=1
In this way, the new index will be updated faster, because the disk will be refreshed only when the data table is closed. However, the server may be killed when the data table is opened. You can use the Parameter--myisam-recover
To ensure data security or run before database restart.myisamchk
Command (however, in this case, useDELAY_KEY_WRITE
In this case, nothing will be lost, because indexes can always be regenerated from data ).