I. Overview of optimization
The MySQL database is a common two bottleneck that is CPU and I/O bottlenecks, when the CPU is saturated when the data is loaded into memory or read from disk. Disk I/O bottlenecks occur when the load data is much larger than the memory capacity, if the application is distributed over the network, then when the query volume is quite large, then the flat bottleneck will appear on the network, we can use Mpstat, Iostat, SAR and vmstat to view the performance status of the system.
In addition to the server hardware performance bottlenecks, for the MySQL system itself, we can use tools to optimize the performance of the database, usually there are three kinds: using the index, using explain to analyze the query and adjust the internal configuration of MySQL.
Second, optimization analysis
In the optimization of MySQL, usually need to analyze the database, common analysis means have slow query log, EXPLAIN analysis query, profiling analysis and Show command query system state and system variables, through the location analysis performance bottleneck, can better optimize the performance of the database system.
1 Performance Bottleneck location show command
We can see the Mysql status and variables through the show command, find the bottleneck of the system:mysql> show status--display status information (extended show state like ' XXX ') mysql> show variables-- Show System variables (extended show variables like ' XXX ') mysql> show InnoDB status--show InnoDB storage engine status mysql> show processlist--view current SQL execution, Includes execution status, lock table, etc. shell> mysqladmin variables-u username-p password--Display System variables shell> mysqladmin extended-status-u Username-p password--Display status information View status variables and help:shell> mysqld–verbose–help [|more #逐行显示]
2. The slow query log is turned on:
Add two configuration parameters under the [mysqld] line in the configuration file my.cnf or My.ini log-slow-queries=/data/mysqldata/slow-query.log long_query_time=2 Note: The log-slow-queries parameter is the location of the slow query log storage, generally this directory to have MySQL running account of the writable permissions, generally set this directory as MySQL data storage directory; long_query_time= 2 in 2 means that the query is not logged for more than two seconds, and the log-queries-not-using-indexes parameter is added in MY.CNF or My.ini to indicate that the query is not indexed. Log-slow-queries=/data/mysqldata/slow-query.log long_query_time=10
3. Slow Query Analysis Mysqldumpslow
We can see which SQL execution is inefficient by opening the log file [[email protected] mysql]# more Slow-query.log # time:081026 19:46:34 # [email protected]: Root[root] @ l Ocalhost [] # query_time:11 lock_time:0 rows_sent:1 Rows_exam ined:6552961 Select COUNT (*) from T_user; From the log, you can find that the query time exceeds 5 seconds of SQL, and less than 5 seconds does not appear in this log. If you have a lot of records in the slow query log, you can use the Mysqldumpslow tool (the MySQL client installation comes with) to subtotal the slow query log. The Mysqldumpslow summarizes the log files and displays summary results. Enter log's storage directory and run [[Email protected]_data] #mysqldumpslow slow-query.log Reading MySQL Slo W query log from Slow-query.log count:2 time=11.00s (22s) lock=0.00s (0s) rows=1.0 (2), root[r Oot] @mysql Select COUNT (N) from T_user; Mysqldumpslow Command/path/mysqldumpslow -S C-T 10/database/mysql/slow-query.log This outputs 10 SQL statements with the highest number of records, where:-S, which is how to sort by, C, T, L, R, respectively, according to the number of records, time , query time, the number of records returned to sort, AC, at, AL, AR, indicating the corresponding flashbacks;-T, which is the meaning of top N, that is, to return the data of the first number of bars;-G, you can write a regular match pattern in the back, case insensitive; for example:/path/mysqldumpslow-s R-t 10/database/mysql/slow-log Gets the top 10 queries that return a recordset. /path/mysqldumpslow-s t-t 10-g "left join"/database/mysql/slow-log to get the first 10 lines sorted by time, containing the query statement that contains the connection.
Using the Mysqldumpslow command can be very clear to get a variety of query statements we need, the MySQL query statement monitoring, analysis, optimization is a very important step in MySQL optimization. When the slow query log is turned on, due to the logging operation, CPU resources will be used to affect the performance of MySQL, but it can be opened periodically to locate the performance bottleneck.
4. Explain analysis query
Using the EXPLAIN keyword, you can simulate the optimizer executing SQL query statements to know how MySQL handles your SQL statements. This can help you analyze the performance bottlenecks of your query statement or table structure. With the explain command you get:
– Table reading order – The operation type of the data read operation – which indexes can be used – which indexes are actually used – references between tables – how many rows per table are queried by the optimizer
explain field: øtable: Displays the data for this row about which table is Øpossible_keys: Displays the indexes that may be applied to this table. If it is empty, there is no possible index. You can select an appropriate statement from the where statement for the related domain Økey: The index that is actually used. If NULL, the index is not used. MySQL rarely chooses to optimize poorly indexed indexes, which can be used to force an index to be used in a SELECT statement or to force an index to be ignored by ignore indexes Økey_len: The length of the index used. In the case of no loss of accuracy, the shorter the better Øref: Which column of the display index is used, if possible, is a constant ørows:mysql the number of rows that must be retrieved to return the requested data øtype: This is one of the most important fields that shows what type the query uses. The best to worst connection types are system, const, EQ_REG, ref, range, index, and Allnsystem, const: You can convert the variables of the query to constants. such as id=1; The ID is the primary key or unique key. Neq_ref: Accesses the index and returns the data for a single row. (typically occurs when a join, the query uses an index that is the primary key or a unique key) Nref: accesses the index and returns data for a value. (You can return multiple lines) typically occurs when Nrange is used: This connection type uses the index to return rows in a range, such as using > or<What happens when you look up something, and the index is built on that field (note:not necessarily better than index) nindex: Full table Scan in the order of the index, the advantage is not the sort, the disadvantage is also the full table scan Nall: Full table scan, should try to avoid Øextra: about MySQL How to parse query additional information, mainly has the following several nusing Index: Use only indexes to avoid access to tables. Nusing Where: Use the Where to worry about the data. Not all WHERE clause are to show the using where. If the= way to access the index.nusing tmporary: Use the temporary table nusing Filesort: Use the extra sort. (an extra sort is used when using order by v1 instead of the index) Nrange checked for Eache record (index map:n): No good index.
three . Index and query optimization
1. Types of indexes
Ø Normal Index: This is the most basic index type, no uniqueness, and so on.
Ø Uniqueness Index: Basically the same as a normal index, but all index column values remain unique.
Ø PRIMARY key: The primary key is a unique index, but must be specified as "PRIMARY key".
Full-Text indexing: MySQL supports full-text indexing and full-text retrieval starting from 3.23.23. In MySQL, the index type of the full-text index is fulltext. A full-text index can be created on a varchar or text-type column.
Most MySQL indexes (PRIMARY KEY, UNIQUE, index, and fulltext) are stored using the B-tree. Index of Spatial column type use R-Tree, memory table supports hash index.
2. Single-row and multi-column indexes (composite index)
The index can be a single-column index or a multicolumn index. Using indexes on related columns is one of the best ways to improve the performance of select operations.
Leftmost prefixes (features of multi-column indexes) a multi-column index has one feature, the leftmost prefix (leftmost prefixing). If a multi-column index is key (FirstName LastName age), MySQL will use the multicolumn index when the search condition is the combination and order of the following columns: Firstname,lastname,agefirstname, Lastnamefirstname that is, the equivalent is also established key (FirstName LastName) and Key (FirstName).
3. Recommendations for indexing and writing SQL statements
1. Use limit 12 when the result set has only one row of data. Avoid select *, always specify that the more columns you need to read from the table, the query becomes slower. He increases the time it takes for the disk to operate, or if the database server is separate from the Web server. You will experience a very long network delay, simply because the data is not required to be transferred between servers. 3. Use a connection (join) instead of a subquery (sub-queries) connection (join): It is more efficient because MySQL does not need to create temporary tables in memory to complete this logical two-step query effort. 4. Use enum, CHAR instead of varchar, and use a reasonable field property length of 5. Use not NULL6 as much as possible for each field. Fixed-length tables are faster ( as long as the fields in your table contain one , the table is not fixed-length )
8. The smaller the query column, the faster
9. Save the IP address as UNSIGNED INT
Choosing the right storage engine MyISAM is ideal for applications where there are fewer queries for multiple write operations InnoDB write applications perform better and support transactions
do not ORDER by RAND () consumes performance, you can use other methods to implement random
12. Use the query cache Note that some functions cause the query cache to not work because the data they return is not fixed. like now () and RAND ()
When multiple conditions are taken, the conditions that will extract as little data as possible are placed in front, reducing the query time of the latter where condition
Some where conditions cause the index to be invalid:
Øwhere clause in the query conditions have! =,mysql will not be able to use the index.
Øwhere clause when using the MySQL function, the index will be invalid, for example: SELECT * from TB where left (name, 4) = ' xxx '
Ø when searching for matches using like, the index is valid: SELECT * from TBL1 where name as ' xxx% ', and the index is invalid if '%xxx% '
14. Using the database split see the Five
Iv. Configuration Optimization
After installing MySQL, the configuration file my.cnf in the/mysql installation directory/share/mysql directory, which also contains a number of configuration files for reference, there are my-large.cnf, MY-HUGE.CNF, MY-MEDIUM.CNF,MY-SMALL.CNF, respectively, corresponding to the configuration of large and small database applications. The. ini file in the MySQL installation directory is present in the win environment.
The following is a list of major variables that have a large impact on performance optimization, which are divided into connection request variables and buffer variables.
1. Connect the requested variable:
Max_connections
The maximum number of MySQL connections, increasing this value increases the number of file descriptors required by the mysqld. If the server has a large number of concurrent connection requests, it is recommended that this value be increased to increase the amount of concurrent connections.
Of course, this is based on the case of the machine can support, because if the number of connections, between MySQL will provide a connection buffer for each connection, it will cost more memory, so to adjust the value properly, you can not blindly increase the value set.
If the value is too small, error 1040:too Many connections errors, you can check the number of connections in the current state through the ' conn% ' wildcard to decide the size of the values.
Show variables like ' max_connections ' Maximum number of connections
Show status like ' Max_used_connections ' in response to the number of connections
If the max_used_connections is the same as max_connections, then it is max_connections set too low or exceed the server load limit, less than 10% is set too large.
Back_log
The number of connections that can be staged by MySQL. This works when the primary MySQL thread gets very many connection requests in a very short period of time. If MySQL's connection data reaches Max_connections, the new request will be present in the stack, waiting for a connection to release the resource.
2. Buffer Variables
Key_buffer_size
KEY_BUFFER_SIZE Specifies the size of the index buffer, which determines the speed of index processing, especially the speed of index reads. By checking the status values key_read_requests and Key_reads, you can see if the key_buffer_size settings are reasonable.
The proportional key_reads/key_read_requests should be as low as possible, at least 1:100,1:1000 better (the above status values can be obtained using the show status like ' key_read% '). Key_buffer_size only works on MyISAM tables. Even if you do not use the MyISAM table, the internal temporary disk table is the MyISAM table and this value is used. You can use the Check status value created_tmp_disk_tables to learn more.
Query_cache_size
Using query buffering, MySQL stores the query results in a buffer and will read the results directly from the buffer in the future for the same SELECT statement (case sensitive). By checking the status value qcache_*, you can know whether the Query_cache_size setting is reasonable (the above status values can be obtained using the show status like ' qcache% ').
If the value of Qcache_lowmem_prunes is very large, it indicates that buffering is often insufficient, if the value of Qcache_hits is also very large, it indicates that the query buffer is used very frequently, and the buffer size needs to be increased.
If the value of qcache_hits is small, it indicates that your query repetition rate is very low, in which case the use of query buffering will affect efficiency, then you can consider not to query buffering. Additionally, adding Sql_no_cache in the SELECT statement can make it clear that the query buffer is not used
Record_buffer_size
Read_rnd_buffer_size
Sort_buffer_size
Join_buffer_size
Table_cache
Max_heap_table_size
Tmp_table_size
Thread_cache_size
Thread_concurrency
Wait_timeout
Configure several variables for InnoDB
Innodb_buffer_pool_size for the InnoDB table, the role of innodb_buffer_pool_size is equivalent to key_buffer_size for the MyISAM table. InnoDB uses this parameter to specify the size of memory to buffer data and indexes. For a separate MySQL database server, the maximum value can be set to 80% of physical memory. According to the MySQL manual, the recommended value for 2G memory machines is 1G (50%). The Innodb_flush_log_at_trx_commit primarily controls the point at which InnoDB writes data from log buffer to a log file and flush the disk, with values of 0, 1, and 23, respectively. 0, indicates that when the transaction commits, does not do the log write operation, but writes the data in the log buffer every second and flush the disk once; 1,
In every second or every time the commit will cause log file write, flush disk operation, to ensure the acid of the transaction, set to 2, each transaction commit causes the action to write to the log file, but complete the flush disk operation every second.
The actual test found that this value has a very large effect on the speed at which the data is inserted, that it takes only 2 seconds to insert 10,000 records at 2 o'clock, 0 for 1 seconds, and 1 seconds when set to 229. Therefore, the MySQL manual also recommends merging inserts into one transaction as much as possible, which can greatly increase the speed. Depending on the MySQL manual, you can set the value to 0 or 2 if the risk of losing the most recent transaction is allowed. Innodb_log_buffer_sizelog cache size, generally 1-8m, default is 1M, for larger transactions, you can increase the cache size. Can be set to 4M or 8M. Innodb_additional_mem_pool_size This parameter specifies the size of the memory pool that InnoDB uses to store data dictionaries and other internal structures. The default value is 1M. Usually not too big, as long as enough on the line, should be related to the complexity of the table structure. If not enough, MySQL writes a warning message to the error log. According to the MySQL manual, for 2G memory machines, the recommended value is 20M, can be appropriately increased. Innodb_thread_concurrency=8 is recommended to be set to Numcpus+numdisks, which is typically 8 by default
Five. Database splitting
Vertical slicing and horizontal slicing
First, the basic idea
The basic idea of sharding is to reduce the performance of a single database by splitting a database into multiple parts onto different databases (servers). Less strictly speaking, the database of large amounts of data, if it is because of many tables and more data, it is appropriate to use vertical segmentation, that is, closely related (such as the same module) of the table is sliced out on a server. If there are not many tables, but there is a lot of data for each table, it is appropriate to split the table's data into multiple databases (servers) by a rule (for example, by ID hash) for horizontal segmentation. Of course, the reality is more of these two situations mixed together, this time need to choose according to the actual situation, may also be combined with vertical and horizontal segmentation, so that the original database into a similar matrix can be infinitely expanded database (server) array. Below is a detailed description of vertical slicing and horizontal slicing.
The most important feature of vertical slicing is the simple rules, the implementation is more convenient, especially suitable for the coupling degree between the business non-
Often low, interacting very small, the business logic is very clear system. In this kind of system, it is easy to do different industries
The tables used by the service module are split into different databases. Depending on the table to be split, the impact on the application is also
Smaller, the split rule will be simpler and clearer. (This is called "Share Nothing").
Horizontal segmentation is relatively slightly more complex than vertical slicing. Because you want to split different data from the same table
In different databases, the split rule itself is more complex to split than the table name for the application, after
Data maintenance will be more complex.
Let's consider the segmentation of data from a common situation: on the one hand, all tables in a library are not usually concatenated by a single table, which implies that horizontal slicing is almost always about a table that is closely related to a small rubbing (in fact, a vertically sliced block), and not for all tables. On the other hand, some very high-load systems, even if only a single table could not bear its load through a single database host, means that vertical slicing alone is not a complete solution to the question. As a result, most systems use vertical and horizontal slicing together, vertically slicing the system, and selectively slicing horizontally for each small rubbing table. The entire database is then cut into a distributed matrix.
Second, the segmentation strategy
As mentioned earlier, the segmentation is done by first vertical slicing and then horizontal slicing. The result of vertical slicing is just the foreshadowing of horizontal slicing. The idea of vertical segmentation is to analyze the aggregation relationship between tables, and put together the closely related tables. In most cases it may be the same module, or the same "aggregation". The "gathering" here is what the field-driven design says about aggregation. In the vertical tangent table aggregation, find "root element" (here "root element" is the domain-driven design "aggregation root"), according to the "root element" for horizontal segmentation, that is, starting from the "root element", all and its direct and indirect associated data into a shard. The likelihood of such a cross-shard association is very small. The application does not have to interrupt existing inter-table associations. For example: For social networking sites, almost all of the data will eventually be linked to a user, and segmentation based on the user is the best choice. For example, the forum system, the user and forum two modules should be divided in the vertical segmentation in two Shard, for the Forum module, BBS is obviously the aggregation of roots, so according to forum for horizontal segmentation, It is natural to put all the posts and replies in the Forum in a shard with the forum.
For shared data data, if it is a read-only dictionary table, maintaining a copy in each shard should be a good choice, so you don't have to interrupt the correlation. If it is a cross-node association between the general data, it must be interrupted.
in particular, there are subtle changes to the segmentation strategy when vertical and horizontal segmentation are performed simultaneously. For example, when only vertical segmentation is considered, the tables that are divided together can maintain arbitrary correlation, so you can divide the table by "function module", but once the horizontal segmentation is introduced, the relationship between the tables is greatly constrained. Typically, you can only allow a primary table (a table that is hashed with that table ID) and its multiple secondary tables to retain an association, that is, when vertical and horizontal segmentation is done vertically, the slice in the vertical direction will no longer be divided with the "function module", but rather finer-grained vertical slicing, which is the granularity of the domain-driven design The concept of "aggregation" coincides with, or even is, identical, and the main table of each shard is the aggregation root in an aggregation! In this way you will find that the database is fragmented (the number of Shard will be more, but the table in Shard is not many), in order to avoid managing too many data sources, make full use of each database server resources, you can consider the business is similar, and Two or more shard with similar data growth rates (the primary table data volume at the same order of magnitude) are placed in the same data source, each shard remains separate, they have their own primary table, and are hashed with their respective master table IDs. The difference is that their hash modulus (that is, the number of nodes) must be consistent.
1. Transaction issues:
There are two possible scenarios for solving a transactional problem: distributed transactions and a simple comparison of the two sets of scenarios under which the application and the database co-control implement the transaction.
Scenario one: Using Distributed transactions
Advantages: The database management, simple and effective
Cons: High performance costs, especially shard more and more
Scenario Two: Co-controlled by applications and databases
Principle: Splitting a distributed transaction across multiple databases into multiple
Small transactions on a single database, and is controlled by the application
Each small transaction.
Advantages: Advantages in Performance
Cons: Requires the application to be flexible in transaction control. If you use
The transaction management of spring, the change will face some difficulties.
2. Cross-node Join issues
The problem of cross-node join is unavoidable as long as it is slicing. But good design and segmentation can reduce the occurrence of this kind of situation. The common practice of solving this problem is to implement the query in two times. Identify the ID of the associated data in the result set of the first query, and initiate a second request to get the associated data based on those IDs.
3. Cross-node Count,order by,group by and aggregation function issues
These are a kind of problem, because they all need to be calculated based on all data sets. Most agents do not automatically process the merge work. Solution: Merge on the application side, similar to resolving cross-node join problems, with results obtained on each node, respectively. Unlike join, the query for each node can be executed in parallel, so many times it is much faster than a single big table. However, if the result set is large, the consumption of application memory is a problem.
MySQL performance optimization