"MySQL" MySQL for large data volume common technology _ CREATE INDEX + cache configuration + sub-database sub-table + Sub-query optimization (reprint)

Source: Internet
Author: User
Tags create index mysql index

Original address: http://blog.csdn.net/zwan0518/article/details/11972853

Directory (?) [-]

    1. One query optimization
      1. 1 Creating an Index
      2. 2 Configuration of the cache
      3. 3slow_query_log Analysis
      4. 4 Sub-Library sub-table
      5. 5 Sub-Query optimization
    2. Two data transfers
      1. 21 Inserting data

Now with the development of the Internet, the magnitude of the data is also exponential growth, from GB to TB to PB. The operation of the data is also more difficult, the traditional relational database can not meet the needs of fast query and insert data. At this time, the advent of NoSQL temporarily solved the crisis. It gains performance by reducing the security of the data, reducing support for transactions, and reducing support for complex queries. However, in some cases nosql compromises are not enough to satisfy the usage scenario, for example, some usage scenarios are absolutely business and security indicators. NoSQL is certainly not sufficient at this time, so it is still necessary to use a relational database.

While relational databases are inferior to NoSQL databases in massive amounts of data, their performance will meet your needs if you operate correctly. Different operations for the data, the direction of optimization is not the same. For data migration, query and insert operations, can be considered in different directions. When optimizing, you also need to consider whether other related operations will have an impact. For example, you can improve query performance by creating an index, but this can lead to the insertion of data because the performance of the insert degrades when you want to create an update index, and you can accept this reduction. Therefore, the optimization of the database is to consider a number of directions, looking for a compromise of the best solution.

One: Query optimization 1: Create an index.

The simplest and most commonly used optimization is the query. Because the read operation occupies most of the scale for CRUD operations, read performance essentially determines the performance of the application. The most common use for query performance is to create indexes. After testing, 20 million records, each record 200 bytes two columns varchar type. It takes a minute to query a record without an index, and the query time can be ignored when an index is created. However, when you add an index to an existing data, it takes a very large amount of time. After I insert 20 million records, I create an index about a few 10 minutes.

Disadvantages and occasions for creating indexes. Although creating an index can greatly optimize the speed of queries, the drawbacks are obvious. One is that when inserting data, creating an index also consumes part of the time, which reduces the performance of the insert to a certain extent, and the other is obviously that the data file becomes larger. When you create an index on a column, the length of each index is the same as the length you set when you created the column. For example, if you create a varchar (100), when you create an index on that column, the index length is 102 bytes, because a length of more than 64 bytes will add an additional 2 bytes to the length of the record index.

From the Ycsb_key column (length 100), you can see that I created an index with a name of Index_ycsb_key, each with a length of 102, and imagine that the size of the index cannot be underestimated when the data becomes extremely large. Also, it can be seen that the length of the index and the length of the column type are also different, such as varchar it is a variable length character type (see MySQL data type analysis), the actual storage length is the actual character size, but the index is the size of the length you declared. When you create a column, you declare 100 bytes, then the index length is the byte plus 2, and it doesn't matter how big your actual storage is.

In addition to the time it takes to create an index, the size of the index file becomes bigger and larger, and creating an index also requires looking at the characteristics of your stored data. When you store a large part of the data is a duplicate record, then this time to create an index is a harm without a profit. Please review the MySQL index introduction first. So, when a lot of data is repeated, the effect of the index's query promotion can be ignored directly, but at this time you also have to bear the performance cost of creating indexes when inserting data.

2: The cached configuration.

In MySQL there are a variety of caches, some cache is responsible for caching query statements, and some are responsible for caching query data. These cached content clients are not operational and are maintained by the server side. It will be updated with the corresponding actions such as your query and modification. With its configuration file we can see the cache in MySQL:

Here the main analysis of the query cache, which is mainly used to cache queries data. When you want to use the cache, you must set the Query_cache_size size to not 0. When the setting size is not 0, the server caches the results returned by each query, and the next time the same query server fetches the data directly from the cache, instead of executing the query. The amount of data that can be cached is related to your size setting, so when you set it large enough, the data can be fully cached to memory, and it will be very fast.

However, the query cache also has its drawbacks. When you do any update operation (Update/insert/delete) on the data table, the server will force the cache to flush the cached data to ensure that the cache is consistent with the database, causing the cached data to be invalidated. Therefore, when a table has a very good number of updates to the table, query cache will not be able to improve performance and affect the performance of other operations.

3:slow_query_log analysis.

In fact, for the query performance improvement, the most important is the most fundamental means is also slow_query settings.

When you set Slow_query_log to ON, the server will log each query, and log the query when it exceeds the slow query time (long_query_time) you set. While you optimize the performance, you can analyze the slow query log, the query of the slow query for the purpose of optimization. By creating various indexes, you can work with tables. Then why divide the table that, when not divided into the table when the place is the limit of performance. Here's a brief introduction.

4: Sub-database sub-table

The Sub-database table should be the killer of query optimization. The above measures in the amount of data reached a certain level, the role of optimization can not be obvious. At this time, the amount of data must be diverted. There are two kinds of measures, such as sub-database and sub-table. And there are two ways of dividing table and vertical slicing and horizontal slicing. Here's a brief introduction to each of these approaches.

For MySQL, the data files are stored as files on disk. When a data file is too large, the operation of the operating system on large files will be more cumbersome and time-consuming, and some operating systems do not support large files, so this time must be divided into tables. In addition, the common storage engine for MySQL is InnoDB, and its underlying data structure is B + tree. When the data file is too large, the B + tree will be more from the level and node, when querying a node may query a number of levels, and this will inevitably cause multiple IO operations to load into memory, it will certainly be time-consuming. In addition there are innodb for the B + tree lock mechanism. Lock each node, then when changing the table structure, this time the tree is locked, when the table file is large, this can be considered to be not possible.

So we have to do the operation of the sub-table and the library.

5: Sub-query optimization in the query often use subqueries, in sub-queries generally use in or exist keyword. For in and exist when the data volume is large to a certain extent, query execution time is very different. However, to avoid such situations, the best way to do this is to use a join query. Because in most cases, the server's query optimization for join is much higher than the subquery optimization. In the relatively high version of the 5.6,mysql query will automatically optimize in query into joint query, there will be no sub-query slow problem. Sometimes the DISTINCT keyword can be used to limit the number of subqueries, but it is important to note that distinct is often converted to group by, this time there will be a Temp Table, there will be a delay in copy data to the temporary table.  For more sub-query optimizations, click. Two: Data transfer

When the amount of data reaches a certain level, then moving the library will be a very prudent and dangerous job. It is a very difficult problem to ensure the consistency of the data, the processing of various emergencies and the change of data in the process of moving the library.

2.1: Insert data when the data migration, there will be a re-import of big data, you can choose the direct load file, sometimes you may need to insert code. At this point, you need to optimize the INSERT statement. This time you can use the Insert delayed statement, which is when you make an insert request, not immediately inserted into the database, but placed in the cache, waiting for the time to mature before inserting.

I want to add ...

"MySQL" MySQL for large data volume common technology _ CREATE INDEX + cache configuration + sub-database sub-table + Sub-query optimization (reprint)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.