Detailing how to improve data loading efficiency in MySQL

Source: Internet
Author: User
Tags execution insert sql mysql mysql insert file permissions access mysql database
Most of the time you are concerned with optimizing select queries, because they are the most commonly used queries, and it is not always straightforward to determine how to optimize them. In contrast, loading data into a database is straightforward. However, there are also strategies that can be used to improve the efficiency of data loading operations, and the rationale is as follows:

A bulk mount is faster than a single-line load because after each record is loaded, the index cache is not refreshed, and can be refreshed after a batch record is mounted.

Loading the peso after the table has no index is faster to load. If you have an index, you must not only increase the record to the data file, but also modify each index to reflect the new record that was added.

Shorter SQL statements are faster than long SQL statements because they involve less analysis of the server side and are faster because they are sent from the client to the server over the network. Some of these factors seem trivial (especially the last one), but if a large amount of data is to be loaded, even small factors can produce very different results. We can deduce several practical conclusions about how to load data as quickly as possible using the above general principles:

Load DATA (including all of its forms) is more efficient than insert because it loads rows in batches. The index refreshes less, and the server only needs to parse and interpret a single statement rather than a few statements.

Load data is more efficient than load data local. With load DATA, files must be located on the server and must have file permissions, but the server can read files directly from disk. Using the load DATA Local, the client reads the file and sends it over the network to the server, which is slow.

If you must use INSERT, you should use a form that allows you to specify multiple lines in a single statement, such as:

The more rows you can specify in the statement, the better. This reduces the number of statements required and reduces the amount of index refreshes. If you use mysqldump to generate a database backup file, you should use the--extended-insert option to make the dump file contain multiple INSERT statements. You can also use-o P-t (optimization), which enables the--extended-insert option. Conversely, you should avoid using the mysqldump--complete-insert option, which causes the INSERT statement to be a single row, longer execution time, and requires more analysis than statements generated without the--complete-insert option.

Use compressed client/server protocols to reduce network data traffic. For most MySQL clients, you can specify them by using the--compress command-line option. It is typically used only for slower networks, because compression requires a large amount of processor time.

Let MySQL insert the default value; Do not specify the columns in the INSERT statement that will give the default values in any way. On average, this makes the statement shorter and reduces the number of characters sent over the network to the server. In addition, the statement contains fewer values, and the server does less analysis and conversion.

If the table is indexed, you can use bulk inserts (LOAD DATA or multiple-line INSERT statements) to reduce the cost of indexing. This minimizes the impact of index updates, because indexes need to be refreshed only when all row processing is obsolete, not after each row is processed.

If you need to load a large amount of data into a new table, you should create the table and load the data before the index is loaded, which is faster. It is faster to create an index at a time rather than to modify it once per row.

If the index is deleted or disabled before loading, recreating or enabling the index after loading the data may make the load faster. If you want to use a delete or disable policy for data loading, be sure to do some experimentation to see if it's worth it (if you load a small amount of data into a large table, the rebuild and index may take longer than the time it takes to load the data).

You can use DROP Index and create index to delete and rebuild indexes. An alternative approach is to disable and enable indexing with Myisamchk or Isamchk. This requires an account on the MySQL server host and write rights to the table file. To disable the table index, go to the appropriate database directory and execute one of the following commands:

To have. The MyISAM table of the index file for the myi extension uses myisamchk, which has a. The ISAM table of the index file for the ISM extension uses Isamchk. After the data is loaded into the table, activate the index as follows:

If you decide to use the index to disable and activate, you should use the table described in chapter 13th to fix the locking protocol to prevent the server from changing the lock at the same time (although the table is not repaired at this point, you will need to use the same locking protocol to modify it like a table repair procedure).

The above data load principle also applies to fixed queries related to clients that need to perform different operations. For example, you generally want to avoid running a select query on a frequently updated table for a long time. Running a select query for a long period of time generates a lot of contention and reduces the performance of the write program. One possible workaround is to save the record in a temporary table, and then periodically add the records to the main table if the write is performed primarily as an insert operation. This is not a viable option if you need to access new records immediately. But you can use this method as long as you don't have access to them for a short period of time. There are two advantages to using temporary tables. First, it reduces contention with the SELECT query statement on the primary table, so execution is faster. Second, the total time to load a record into the primary table from a temporary table is less than the total time to load the record separately; the corresponding index cache is refreshed only at the end of each bulk load, not after each row is loaded. One application of this strategy is to access the Web page of the Web server to the MySQL database. In this case, there may not be a higher permission to ensure that records immediately enter the primary table.

If the data is not exactly one of the individual records inserted in a system shutdown event, another strategy for reducing index refreshes is to use the MyISAM table's Delayed_key_write table creation option (this may occur if MySQL is used for some data entry work). This option causes the index cache to refresh only occasionally, rather than after each insert.

If you want to use deferred index refreshes on a server-wide scale, just start the mysqld with the--delayed-key-write option. In this case, the index block write operation is deferred until the block has to be refreshed to make room for other index values, or deferred until a flush-tables command has been executed, or until the index table closes.



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.