Document directory
- 7.2.11 how to avoid full table Scan
- 7.2.15 other optimization ideas
7.2.9 how to optimize MySQL
ORDER BY
In some cases, MySQL can directly use indexes to meetORDER BYOrGROUP BYClause without additional sorting.
AlthoughORDER BYThe index is not exactly matched with the order of the index, but can still be used, as long as the unused index part and all the additionalORDER BYField inWhereClause is included. The following queries are all indexed.ORDER BYOrGROUP BYPart:
Select * FROM t1 orDER BY key_part1,key_part2,... ;Select * FROM t1 Where key_part1=constant orDER BY key_part2;Select * FROM t1 Where key_part1=constant GROUP BY key_part2;Select * FROM t1 orDER BY key_part1 DESC, key_part2 DESC;Select * FROM t1Where key_part1=1 orDER BY key_part1 DESC, key_part2 DESC;
In other cases, MySQL cannot use indexes to meet the requirements.ORDER BYAlthough it uses indexes to find records for matchingWhereClause. These situations are as follows:
- Perform different index keys
ORDER BY:
Select * FROM t1 orDER BY key1, key2;
- On the non-consecutive index key
ORDER BY:
Select * FROM t1 Where key2=constant orDER BY key_part2;
- Used at the same time
ASCAndDESC:
Select * FROM t1 orDER BY key_part1 DESC, key_part2 ASC;
- Index key used for searching records
ORDER BYIs not the same:
Select * FROM t1 Where key2=constant orDER BY key1;
- Many tables are connected together, And the read records are
ORDER BYAll fields in are from the first very number table (that is, inEXPLAINThe connection type of the first table in the analysis result is notconst).
- Different
ORDER BYAndGROUP BYExpression.
- Records in Table indexes are not stored in order. For example,
HASHAndHEAPThe table is like this.
RunEXPLAIN Select ... orDER BYTo see if MySQL uses an index in the query. IfExtraThe field value isUsing filesortMySQL cannot use the index. For details, see "7.2.1EXPLAINSyntax (Get Information AboutSelect)".
When the results must be sorted, MySQL 4.1 used the followingfilesortAlgorithm:
- Reads records based on the index key or scans data tables. Those that cannot match
WhereAll clause records are skipped.
- Each record in the buffer uses a pair to store two values (index key and record pointer ). The buffer size depends on system variables.
sort_buffer_sizeDepends on the value.
- When the buffer is slow, run qsort and store the results in temporary files. Save the stored block pointer (if all the 'to' values can be saved in the buffer, you do not need to create a temporary file ).
- Execute the above operation until all the records are read.
- Perform multiple merge operations to merge up
MERGEBUFF(7) blocks in one region are stored in another temporary file. Repeat this operation until all the blocks in the first file are placed in the second file.
- Repeat the preceding operations until the remaining number of blocks is smaller
MERGEBUFF2(15 ).
- In the last multi-merge, only the record pointer (the last part of the sort index key) is written to the result file.
- Read records in sequence by reading the records pointer in the result file. To optimize this operation, MySQL puts the record pointer read into a large block and uses it to read records in sequence and store the records in the buffer. The buffer size is determined by system variables.
read_rnd_buffer_sizeDepends on the value. The code for this step is in the source file'SQL/records. CC'.
One problem with this approximation algorithm is that the database reads two records: one is estimation.WhereThe second clause is the sorting time. Although the record is successfully read for the first time (for example, a full table scan), the second is random reading (the index key has been sorted, but the record does not exist ).
In MySQL 4.1 and later versions,filesortThe optimization algorithm is used to record not only the index key value and record location, but also the fields required in the query. This avoids the need to read records twice. ImprovedfilesortThe algorithm is roughly as follows:
- As before, read matching
WhereClause records.
- Each record records a corresponding; 'tuples 'information, including the index key value, record location, and all fields required in the query.
- Sort the 'tuples 'based on the index key.
- Read records in sequence, but read records from the sorted 'tuples 'list instead of from the data table.
Use the improvedfilesortCompared with the original algorithm, the 'tuples 'take longer space than the 'right'. They are rarely suitable for sorting buffering (the buffer size is determinedsort_buffer_size). Therefore, more I/O operations may be required, resulting in slower improved algorithms. To avoid slowing down, this optimization method is only used to sort the total size of additional fields in the 'tuples 'beyond the system variables.max_length_for_sort_data(The value of this variable is set too high, which means High Disk Load and low CPU load ).
To improveORDER BYThe speed depends on whether MySQL can use indexes instead of additional sorting processes. If you cannot use an index, follow these policies:
- Add
sort_buffer_size.
- Add
read_rnd_buffer_size.
- Modify
tmpdirTo point to a dedicated File System with a lot of free space. If you use MySQL 4.1 or update, this option allows multiple paths to use the loop format. Each path is separated by a colon (':') on UNIX, and a semicolon (';') on Windows, Netware, and OS/2 (';'). This feature can be used to evenly distribute loads to several directories. Note: These paths must be directories distributed on different physical disks, rather than different directories on the same physical disk.
By default, MySQLGROUP BY col1, col2, ...Sort the query andORDER BY col1, col2, ...The query is the same. If it explicitly contains a list with the same fieldORDER BYMySQL will not lose the speed when optimizing it, Because sorting will always happen. If a query containsGROUP BYTo avoid overhead of sorting results, you can useORDER BY NULLTo cancel sorting. For example:
Insert INTO fooSelect a, COUNT(*) FROM bar GROUP BY a orDER BY NULL;
7.2.10 how to optimize MySQL
LIMIT
In some casesLIMIT row_countBut not usedHAVING
The query will be processed differently:
- If you only use
LIMITMySQL sometimes uses indexes, but more often it is a full table scan.
- If
LIMIT row_countAndORDER BYMySQL isRow_countThe sorting will stop, rather than sorting the entire table.
- When
LIMIT row_countAndDISTINCTMySQL findsRow_countA unique record is no longer searched.
- In some cases,
GROUP BYYou can read the index key in sequence (or sort the index key) and calculate the accumulated information until the index key changes. In this case,LIMIT row_countIt does not calculate any non-essentialGROUP BYValue.
- Once MySQL sends a full number of request records to the client, it suspends the query unless
SQL_CALC_FOUND_ROWS.
LIMIT 0 Always Returns an empty result set. This is useful for checking the type of the query or result field.
- When the server uses a temporary table to process queries
LIMIT row_countIt can be used to calculate the amount of space required.
7.2.11 how to avoid full table Scan
If MySQL needs to perform a full table scan to process the queryEXPLAINIn the resulttypeThe field value isALL. MySQL performs a full table scan under the following conditions:
- The data table is too small. A full table scan is much faster than a search for the index key. This is usually done when the total number of records in a table is less than 10 and the record length is relatively short.
- No applicable
ONOrWhereThe index field of the clause.
- By comparing index fields with constant values, MySQL has computed (based on the Index Tree) to a constant that covers a large part of the data table. Therefore, full table scanning should be faster. For details, see "7.2.4 how MySQL Optimizes
Where".
- Other fields use an index key with a small base (many records match the index key value. In this case, MySQL considers that using index keys requires a large number of searches, which is not as fast as full table scan.
For small tables, full table scan is usually more appropriate. However, for large tables, try the following techniques to avoid the full table scan incorrectly selected by the optimizer:
7.2.12 AccelerationInsert
The time taken to insert a record is determined by the following factors, and the following figures roughly indicate the proportion of the impact:
- Connection: (3)
- Send query to server: (2)
- Resolution query: (2)
- Insert record: (1 x record size)
- Insert index: (1 x number of indexes)
- Close: (1)
The overhead of opening the data table during initialization is not considered here, because each query is performed only once.
If it is a B-tree index, as the number of indexes increases, the speed of inserting records decreases in the proportion of log N.
You can use the following methods to increase the insert speed:
- You can use
InsertMultiple statements are attached.VALUESValue. This approach is better than using a single valueInsertThe statement is much faster (in some cases, it is faster ). If you want to add a record to a non-empty data table, you can adjust the variablebulk_insert_buffer_sizeTo make it faster. For more information, see "5.2.3 Server System Variables ".
- If you want to insert a large number of records from different clients, use
Insert DELAYEDStatement can also increase the speed. For details, see "14.1.4InsertSyntax ".
- Pair
MyISAMYou canSelectInsert a record when the statement is running, as long as the record is not deleted at this time.
- To load a text file to a data table, use
LOAD DATA INFILE. This is usually a large numberInsert20 times of the statement. For details, see "14.1.5LOAD DATA INFILESyntax ".
- Through some additional work, it is possible
LOAD DATA INFILEIt runs faster when a data table has a large number of indexes. The procedure is as follows:
- Use
Create TABLECreate a table.
- Run
FLUSH TABLESStatement ormysqladmin flush-tablesCommand.
- Run
myisamchk --keys-used=0 -rq /path/to/db/tbl_nameCommand to delete all indexes of a data table.
- Run
LOAD DATA INFILE, Data is inserted into the table, because the table index does not need to be updated, so this will be very fast.
- Run
myisampackMake the data table smaller. For more information, see "15.1.3.3 compressed table characteristics ".
- Run
myisamchk -r -q /path/to/db/tbl_nameRe-create the index. The created index tree is saved in memory before being written to the disk, which saves the disk search speed. After reconstruction, the index tree distribution is very balanced.
- Run
FLUSH TABLESStatement ormysqladmin flush-tablesCommand.
Note,LOAD DATA INFILEWhen you insert data into an empty table, it is also optimized. The main difference is that: RunmyisamchkMore temporary memory will be allocated for index creation and executionLOAD DATA INFILEThe command allows the database server to allocate memory for re-indexing. Starting from MySQL 4.0, you can runAlter TABLE tbl_name DISABLE KEYSTo replacemyisamchk --keys-used=0 -rq /path/to/db/tbl_name, RunAlter TABLE tbl_name ENABLE KEYSReplacemyisamchk -r -q /path/to/db/tbl_name. This can be omitted.FLUSH TABLESStep.
- Several statements can be executed together after the table is locked to accelerate
InsertOperation:
LOCK TABLES a WRITE; Insert INTO a VALUES (1,23),(2,34),(4,33); Insert INTO a VALUES (8,26),(6,29); UNLOCK TABLES;
This improves performance by: until allInsertAfter the statement is complete, the index cache is refreshed to the disk at a time. Usually, there are several timesInsertStatement will incur the overhead of refreshing the index cache to the disk most times. If multiple values can be inserted in one statement at a time, the displayed lock table operation is unnecessary. For the transaction table, useBEGIN/COMMITReplaceLOCK TABLESTo speed up. The lock table also reduces the total time for multiple connection tests, even though the maximum waiting time for each independent connection to wait for the lock increases. For example:
Connection 1 does 1000 inserts Connections 2, 3, and 4 do 1 insert Connection 5 does 1000 inserts
If no lock table exists, connection 2, 3, 4 will be completed before 1, 5. If the table is locked, the join 2, 3, 4 may be completed after 1, 5, but the total time may only need 40%. MySQLInsert,Update,DeleteOperations are very fast, but if there are more than five inserts or updates in a statement, it is best to lock them for better performance. If you want to insert multiple inserts at a time, it is best to add the values before and after each loop (about 1000 times ).LOCK TABLESAndUNLOCK TABLESSo that other processes can access the data table. The performance is still good.InsertAlways RatioLOAD DATA INFILEData insertion is slow because the implementation policies of the two are different.
- To
MyISAMThe table is faster.LOAD DATA
INFILEAndInsertSystem variables can be added.key_buffer_sizeFor more information, see "7.5.2 Tuning Server Parameters ".
7.2.13 AccelerationUpdate
UpdateStatement optimization andSelectBut it has additional write overhead. The write overhead depends on the number of records to be updated and the number of indexes. If the index does not change, no update is required.
Another way to increase the update speed is to delay the update and put many updates together. If the table is locked, it is much faster to perform multiple updates at the same time than to update the tables separately.
Note:MyISAMIf the dynamic record format is used in the table, the record may be split after it is updated to a longer value. If you do this frequently, do it once in a whileOPTIMIZE TABLEIt is very important. For details, see "14.5.2.5 optimize table Syntax ".
7.2.14 Acceleration
Delete
The time for deleting a single record is almost proportional to the number of indexes. To delete records more quickly, you can increase the index key cache. For details, see "7.5.2 Tuning Server Parameters ".
To delete all records of a data table, useTRUNCATE TABLE tbl_nameInsteadDelete FROM tbl_name. For details, see "14.1.9 truncate Syntax ".
7.2.15 other optimization ideas
This section lists some other ideas for improving the query processing speed:
- Use permanent connection to the database to avoid connection overhead. If you need to initialize many connections and cannot use permanent connections, you can modify the variables.
thread_cache_sizeFor more information, see "7.5.2 Tuning Server Parameters ".
- Always check whether the query uses the existing indexes in the table. In MySQL, you can use
EXPLAINStatement. For details, see "7.2.1 EXPLAIN Syntax (Get Information About a Select )".
- Do not update frequently.
MyISAMTables are too complexSelectStatement to avoid contention for locks between reading and writing.
- In
MyISAMIf no records are being deleted, you can insert records while reading data in other queries. If this is very important, you should try to use the table when the table does not delete records. Another possible solution is to delete a large number of records and then executeOPTIMIZE TABLEStatement.
- If you always need to follow
expr1, expr2, ...UseAlter TABLE ... orDER BY expr1, expr2, ...Modify a table. After you use this method to expand and modify tables, you may obtain higher performance.
- In some cases, the field type is''Hashed''Based on other fields. If this field is short and basically a unique value, it may be faster than using a large index on several fields. This additional field can be used very easily, as follows:
Select * FROM tbl_name Where hash_col=MD5(CONCAT(col1,col2)) AND col1='constant' AND col2='constant';
- If
MyISAMTables are frequently modified in large quantities, so avoid modifying all variable-length fields as much as possible (VARCHAR, BLOB,TEXT). Although the table has only one variable-length field, it uses the dynamic record format. For details, see "15 MySQL Storage Engines and Table Types ".
- Generally, when the data table record changes''Big''Then, it is not very useful to split the table into several different tables. The maximum performance of accessing a record is that the first byte of the record is found during disk search. After finding the record location, most of the current disks can quickly read records for most applications. Set
MyISAMWhen a table is split into multiple unique columns, the fields in the dynamic format in the data table can be modified to a fixed size record, or the table needs to be scanned frequently, however, you do not need to read most of the fields. For details, see "15 MySQL Storage Engines and Table Types ".
- If you need to frequently perform statistics on a table based on many fields, it is better to create a new table to store the statistics of these real-time updates. Updates like the following will be very fast:
Update tbl_name SET count_col=count_col+1 Where key_col=constant;
If you only need table-level locks (multiple reads/one write), useMyISAMThe storage engine is very important, suchMyISAMAndISAMTable. This also boasts good performance in many databases, because row-Level Lock management programs are basically useless in this case.
- To collect statistics from a large log table, you can use the abstract table instead of scanning the entire log table. Maintain abstract table comparison''Real-time''. When things change (such as business decision-making), re-creating the abstract table is much faster than modifying the applications in the operation.
- If possible, it is best to use a classification report.''Real-time''Or''Statistic''The data required for the report only needs to come from the summary table, and the information of the summary table is generated periodically from the real-time data.
- It should be recognized that the field has a default value. You must specify the values when they are different from the default values. This eliminates the need for MySQL to increase the insert speed.
- In some cases, data is assembled and stored in
BLOBType fields are more convenient. Add additional commands to assemble and disassemble the application.BLOBValue in the field, but this can save a lot of storage overhead in some cases. This does not require data compliance.Record-and-fieldFormat table structure is very practical.
- Generally, all redundant data (in the database principle, it is called the "third paradigm") should be stored "). However, it is cost-effective to copy some information or create a summary table for higher efficiency.
- Stored Procedure or
UDFs(User-Defined Functions) may have higher performance when executing some tasks. However, when the database does not support these features, there are other alternative methods to achieve the goal, even if they are a little slow.
- You can obtain results from the query cache or response, and then put many insert and update operations together. If the database supports table locks (such as MySQL and Oracle), this ensures that the index cache only needs to be refreshed once after all update operations.
- You can use
Insert DELAYED. This will increase the speed because multiple records are simultaneously written to the disk.
- When you want
SelectThe statement has a higher priority than the insert operation.Insert LOW_PRIORITY.
- Use
Select HIGH_PRIORITYTo skip the queue, that is, even if other clients are writing dataSelectThe execution is complete.
- In
InsertMultiple record insertion formats are used in statements (supported by many databases ).
- Use
LOAD DATA INFILETo import a large amount of data.InsertFast.
- Use
AUTO_INCREMENTField to generate a unique value.
- Regular execution
OPTIMIZE TABLEPrevents dynamic record formatMyISAMTable fragments. For details, see "15.1.3MyISAMTable storage formats ".
- Use
HEAPTable, which may increase the speed. For details, see "15.1.3MyISAMTable storage formats ".
- In normal Web server configuration, it is best to store image files as files, and only store the file index information in the database. The reason for this is that, in general, the web server always performs better on the File Cache than the database, so using file storage will make the system easier and faster.
- For infrequently accessed data that is not very important, it can be stored in a memory table. For example, when the Web Client cannot save cookies, it is used to save the title of the last display.
- Fields with the same values in different tables should be declared as of the same type. Before MySQL 3.23, table join operations are slow. Make the field name as simple as possible. For example
customerInnameTo replacecustomer_nameAs the field name. To make the field name portable in other database systems, it should be within 18 characters.
- If you need real high speed, we recommend that you look at the differences between the underlying data storage interfaces supported by various database servers. For example
MyISAMThe storage engine is 2-5 times faster than other SQL interfaces. This requires that the data must be on the same server as the application, and it is usually accessed by only one process (because the external file lock is indeed slow ). Using only one process can eliminate the introduction of the underlyingMyISAMThe problem caused by the Command (this is easy to get higher performance, if needed ). Because of the careful design of database interfaces, it is easy to support this optimization method.
- If you use digital data, in many cases you want to access the database (using online connections) faster than using text files. Because the digital information is more compact than the text files stored in the database, only less disk searches are required for access. In addition, you can save code in the application because you do not need to parse text files to find the corresponding rows and fields.
- Database Replication has performance benefits for some operations. The client can obtain data from multiple replication servers, which can allocate the load. To avoid slowing down the master server when backing up data, you can also store the backup on the slave server. For details, see "6 Replication in MySQL ".
- Definition
MyISAMOption added when tableDELAY_KEY_WRITE=1In this way, the new index will be updated faster, because the disk will be refreshed only when the data table is closed. However, the server may be killed when the data table is opened. You can use the Parameter--myisam-recoverTo ensure data security or run before database restart.myisamchkCommand (however, in this case, useDELAY_KEY_WRITEIn this case, nothing will be lost, because indexes can always be regenerated from data ).