Some practices of MySQL stand-alone database optimization _mysql

Source: Internet
Author: User
Tags connection pooling datetime joins mysql in

Database optimization has a lot to say, according to the amount of data can be divided into two stages: stand-alone database and sub-Library, the former can support 500W or 10G of data, more than this value will need to consider the Sub-Library table. In addition, the general large business interview will often be from a stand-alone database, step-by-step ask to the Sub-Library, the middle will be interspersed with a lot of database optimization problems. This paper attempts to describe the practice of stand-alone database optimization, database based on MySQL, if there is unreasonable place, welcome correction.

1. Table Structure Optimization

In the beginning to do an application, the database table structure design will often affect the performance of the late application, especially the user to come up later performance. Therefore, table structure optimization is a very important step.

1.1, Character Set

In general, as far as possible to choose UTF-8, although in the store at noon GBK than UTF-8 use less storage space, but UTF-8 compatible with the various languages, in fact, we do not have to save for this storage space and sacrificed the extensibility. In fact, if you want to change from GBK to UTF-8 the cost is very high, the need for data migration, and storage space can be spent to expand the hard disk to solve.

1.2, PRIMARY key

When using MySQL's InnoDB, InnoDB's underlying storage model is a B + tree, which uses a primary key as a clustered index, uses the inserted data as a leaf node, and can quickly find the leaf node through the primary key to quickly capture the record. So you need to add a primary key when designing a table, and it's best to grow yourself. Because the self-added primary key allows the inserted data to be inserted into the leaf nodes of the underlying B + tree in the primary key order, the insertion is highly efficient because it is ordered, which requires little to move the existing data. If the primary key is not self increasing, then each time the primary key value is approximately random, it is possible to move a large amount of data to guarantee the properties of the B + tree, adding unnecessary overhead.

1.3, Field

1.3.1, indexed fields must add NOT NULL constraint, and set default value

1.3.2, do not recommend the use of float, double to save decimals, to prevent the loss of precision, recommended decimal

1.3.3, do not recommend the use of Text/blob to save a large amount of data, because the large text read and write will result in greater I/o overhead, while consuming MySQL cache, high concurrency will greatly reduce the throughput of the database, it is recommended to save large text data in a dedicated file storage system, MySQL only save the access address of this file, such as blog posts can be saved in the file, MySQL only save the relative address of the file.

1.3.4, varchar type length is not recommended to exceed 8K.

1.3.5, Time type recommend using datetime, do not use timestamp, although DateTime occupies 8 bytes, and timestamp occupies only 4 bytes, but the latter to ensure non-null, and the latter is time zone sensitive.

1.3.6, add gmt_create and gmt_modified two fields in the suggestion table to record when the data was created. The reason for these two fields is that it is convenient to check the problem.

1.4. Index Creation

1.4.1, this stage due to the business does not understand, so try not to blindly index, only for some will be used to index fields plus the normal index.

1.4.2, create innodb single column index length not exceeding 767bytes, if more than the previous 255bytes as the prefix index

1.4.3, create InnoDB composite index of the column index length not exceeding 767bytes, combined to not more than 3072bytes

2. SQL optimization

Generally speaking, SQL is a few: basic additions and deletions to check, paging query, range query, fuzzy search, multiple table connection

2.1. Basic Inquiry

General queries need to go to the index, if there is no index proposed to modify the query, the indexed field plus, if because the business scene can not use this field, then need to see the query is not large, if large, such as call daily 10w+, this requires new index, if not, such as daily call 100+, You can consider keeping it as it is. In addition, select * as little as possible, the use of what fields in the SQL statement add what, unnecessary fields are not checked, waste I/O and memory space.

2.2, efficient paging

Limit m,n Its essence is to execute the limit m+n first, and then take n rows from the M line, so that when the limit page turn more than M, the performance is lower. Like what

SELECT * from A limit 100000, 10, the performance of this SQL statement is very poor, the proposed change to the following version:

Selec id,name,age from a Where ID >= (select IDs from a limit 100000,1) limit 10

2.3. Scope Inquiry

Range queries include between, greater than, less than, and in. MySQL in the query in the condition of a number of restrictions, if the number of small can go index query, if the number of large, it became a full table scan. And between, greater than, less than, these queries will not go to the index, so try to put the index after the query conditions.

2.4, fuzzy query like

Use like%name% such a statement is not going to index, the equivalent of a full table scan, the amount of data will not be too big problem, the data volume after the performance will be greatly reduced, the proposed data volume after the use of search engines to replace this fuzzy search, It's not going to work. Also add a condition that can be indexed before the fuzzy query.

2.5. Multi-table Connection

Subqueries and joins can be used to fetch data between multiple tables, but the subquery performs poorly, and it is recommended that you change the subquery to join. For MySQL join, it uses the nested Loop join algorithm, that is, through the previous table query result set to the query in the next table, such as the previous table's result set is 100 data, the latter table has 10W data, then need to 100* The 10W data set is filtered to get the final result set. So, try to join the large table with the table of the small result set, and set the index on the join field, if you can't build the index, you need to set a large enough join buffer size. If none of the above techniques can solve the problem of a drop in performance caused by join, simply stop using join and split a join query into two simple queries. In addition, multiple table joins try not to exceed three tables, more than three tables generally performance is poor, it is recommended to split SQL.

3. Database Connection Pool Optimization

The database connection pool is essentially a cache, which is a means of resisting high concurrency. Database connection pool optimization is mainly to optimize the parameters, generally we use the DBCP connection pool, its specific parameters are as follows:

3.1 InitialSize

The initial number of connections, where the initial point is the first getconnection, not when the application is started. The initial value can be set to the historical average of the concurrent quantity

3.2, Minidle

The minimum number of idle connections reserved. DBCP will open a thread that reclaims idle connections in the background, leaving the number of minidle connections when the thread is reclaimed for idle connections. Generally set to 5, the concurrency is really small can be set to 1.

3.3, Maxidle

The maximum number of idle connections reserved, set according to the business concurrency peak. For example, the concurrency peak is 20, then when the peak is over, these connections are not immediately recycled, if a short period of time to a peak, then the connection pool can reuse these idle connections without the need to frequently create and close the connection.

3.4, Maxactive

The maximum number of active connections, in accordance with acceptable concurrency Extremum settings. For example, the maximum number of concurrent machines can be accepted is 100, then this maxactive set to 100, can only be 100 requests at the same time, redundant requests will be discarded after the maximum waiting time. This value must be set to protect the database against malicious concurrent attacks.

3.5, Maxwait

Gets the maximum wait time for the connection, the recommended setting is shorter, such as 3s, which allows the request to fail quickly because a request cannot be freed while waiting for a connection to be made, and the thread concurrency of a single machine is limited, if the time set is too long, such as 60s recommended on the web, Then this thread cannot be freed within this 60s, and as long as the request is too much, the application of the available threads is less and the service becomes unusable.

3.6, Minevictableidletimemillis

The time that the connection remains idle without being recycled, and the default is 30 minutes.

3.7, Validationquery

An SQL statement used to detect the validity of a connection, typically a simple SQL, recommended setting

3.8, Testonborrow

When the connection is requested, the connection is detected and not recommended, which seriously affects performance.

3.9, Testonreturn

When the connection is returned, the connection is detected and not recommended, which seriously affects the performance

3.10, Testwhileidle

After the opening, the background cleaning the connected thread will not be a period of time for the idle connection to Validateobject, if the connection is invalid will be cleared, does not affect performance, recommended Open

3.11, Numtestsperevictionrun

On behalf of each check the number of links, recommended settings and maxactive as large, so that every time you can effectively check all the links.

3.12. Preheat connection Pool

For connection pooling, it is recommended to preheat the application when it is started, and to make a simple SQL query before the external access, so that the connection pool is full of necessary connections.

4. Index optimization

When the amount of data increased to a certain extent, by the SQL optimization has been unable to improve performance, this time need to sacrifice a big trick: index. The index has a level three, generally it is sufficient to master these three levels, in addition, for the indexed fields, you need to consider their selectivity.

4.1, first-level index

An index is established on the condition following the where, a single column can establish a normal index, and multiple columns establish a composite index. Combining indexes requires attention to the leftmost prefix principle.

4.2, two level index

If you have a field that is used by the order by or group by, you can consider indexing on this field so that you can improve performance by avoiding the ordering by and by the order in which it is ordered, because of the nature of the index.

4.3, three level index

If the above two strokes are not enough, then the query of the field also indexed, this time to form a so-called index overlay, this can reduce an I/O operation, because MySQL in the query data, is first check the primary key index, and then according to the primary key index to check the normal index, Then check the corresponding records according to the normal index. If all the records we need are in the normal index, then there's no need for a third step. Of course, this method of indexing is rather extreme and not suitable for general scenarios.

4.4, the selectivity of the index

When indexing is established, try to build on high selectivity fields. What is selectivity high? The so-called high selectivity is through this field to find out the amount of data is small, for example, according to the name of a person to check the information, the amount of data found in the general will be very little, and according to the sex of the database may be half of the data are detected, so, the name is a selective high field, and gender is a selective low field.

5. Historical data archiving

When the amount of data to increase 500W a year, the index can not do, this time the general idea is to consider the Sub-Library table. If there is no explosive growth in the business, but the data does increase slowly, it is possible to archive the historical data without considering the complex technical means of the Sub-Library table. We archive historical data that has been completed for the lifecycle, such as data from 6 months ago. We can use the Quartz scheduling task in the early morning time to 6 months before the data detected, and then stored in the remote hbase server. Of course, we also need to provide a query interface for historical data in case of a rainy date.

The above is the MySQL stand-alone database optimization data collation, follow-up continue to supplement the relevant information, thank you for your support of this site!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.