A mature database architecture is not designed to have a high availability, high scalability, and so on, it is with the increase in user volume, the infrastructure is gradually improved. This blog post mainly discusses the MySQL database development period in the face of problems and optimization plan, aside from the front-end application is not said, roughly divided into the following five stages:
1. database table Design
After the project is set up, the development department develops the project according to the requirements of the Product department, and part of the development engineer Works is the design of the table structure. This is important for a database, which, if poorly designed, directly affects access speed and the user experience. There are many factors affecting, such as slow query, inefficient query, no proper indexing, database blocking (deadlock) and so on. Of course, a team of test engineers will do stress tests and find bugs. For a team without a test engineer, most development engineers will not be too early to consider the database design is reasonable, but to complete the implementation and delivery of functions as soon as possible, such as the project has a certain amount of traffic, hidden problems will be exposed, then it is not so easy to modify it.
2. Database deployment
The operational engineer came out, the initial visit to the project will not be very large, so the single deployment is sufficient to deal with the QPS (query rate per second) at around 1500. Taking into account the high availability, you can use MySQL master-slave replication +keepalived Double-click Hot standby, the common cluster software has keepalived, Heartbeat.
Two-Machine hot standby Blog: http://lizhenliang.blog.51cto.com/7876557/1362313
3, Database performance optimization
If the MySQL deployment to the ordinary X86 server, without any optimization, the MySQL theoretical value of the normal processing of 2000 or so QPS, after optimization, it is possible to upgrade to about 2500 QPS, otherwise, the amount of traffic when reached about 1500 concurrent connection, The database processing performance slows down and the hardware resources are rich, so it's time to consider software issues. So how do you maximize the performance of your database? On the one hand can run multiple MySQL instances of the server to maximize performance, on the other hand, the database is optimized, often operating system and database default configuration is more conservative, there will be a certain limit to the database, you can make appropriate adjustments to these configurations, as far as possible to handle more connections.
The specific optimization has the following three levels:
3.1 Database Configuration optimization
MySQL is commonly used in two storage engines, one is MyISAM, does not support transaction processing, read performance processing quickly, table-level locks. The other is InnoDB, which supports transaction processing (ACID), designed to maximize performance for processing large volumes of data, row-level locks.
Table Lock: Low overhead, large locking granularity, high probability of deadlock and lower relative concurrency.
Row Lock: Large overhead, small locking granularity, low deadlock probability, and high relative concurrency.
Why are table locks and row locks appearing? Mainly in order to ensure the integrity of the data, for example, a user in the operation of a table, other users also want to operate this table, then it is necessary to wait for the first user to operate, other users can operate, table lock and Row lock is the role. Otherwise, multiple users operating a table at the same time, the data will certainly produce conflicts or anomalies.
Depending on the above, using the InnoDB storage engine is the best choice and the default storage engine for later versions of MySQL5.5. There are a number of associated parameters per storage engine, and the following lists the main parameters that affect database performance.
Public parameter Default value:
Max_connections = 151
#同时处理最大连接数, it is recommended to set the maximum number of connections is about 80% of the upper bound
Sort_buffer_size = 2M
#查询排序时缓冲区大小, it works only on order by and group by, which increases this value to 16M
Query_cache_limit = 1M
#查询缓存限制, only 1 m of the following query results will be cached, lest the result of large data cache pool coverage
Query_cache_size = 16M
#查看缓冲区大小, to cache the results of a select query, the next time a select query will return results directly from the cache pool and multiply this value appropriately
Open_files_limit = 1024
#打开文件数限制, if show global status like ' open_files ' views values equal to or greater than the Open_files_limit value, the program will not be able to connect to the database or die
MyISAM parameter Default value:
Key_buffer_size = 16M
#索引缓存区大小, general setting of 30-40% for physical memory
Read_buffer_size = 128K
#读操作缓冲区大小, recommend setting 16M or 32M
InnoDB parameter Default value:
Innodb_buffer_pool_size = 128M
#索引和数据缓冲区大小, general setting of 60%-70% for physical memory
Innodb_buffer_pool_instances = 1
#缓冲池实例个数, recommend setting 4 or 8
Innodb_flush_log_at_trx_commit = 1
#关键参数, 0 represents approximately every second written to the log and synchronized to the disk, the database failure will lose about 1 seconds of transaction data. 1 write to log after each execution of SQL and sync to disk, I/O overhead is high, execute SQL to wait log read and write, inefficient. 2 represents only the log to the system buffer, and then sync to disk per second, efficient, if the server fails, the transaction data will be lost. The data security requirements are not very high recommended settings 2, high performance, the modified effect is obvious.
Innodb_file_per_table = Off
#默认是共享表空间, shared table space idbdata files are growing, affecting certain I/O performance. It is recommended to open the independent table space mode, each table's index and data exist in their own separate tablespaces, you can implement a single table in different databases to move.
Innodb_log_buffer_size = 8M
#日志缓冲区大小, because the log is refreshed once per second, it is generally not more than 16M
3.2 System Kernel Optimization
Most MySQL is deployed on Linux systems, so some of the operating system parameters also affect MySQL performance, and the following are optimized for the Linux kernel.
Net.ipv4.tcp_fin_timeout = 30
#TIME_WAIT超时时间, the default is 60s
Net.ipv4.tcp_tw_reuse = 1
#1表示开启复用, allow time_wait socket to be reconnected to the new TCP connection, 0 indicates shutdown
Net.ipv4.tcp_tw_recycle = 1
#1表示开启TIME_WAIT Socket Quick Recycle, 0 to close
Net.ipv4.tcp_max_tw_buckets = 4096
#系统保持TIME_WAIT the maximum number of sockets, if this number is exceeded, the system randomly clears some time_wait and prints warning messages
Net.ipv4.tcp_max_syn_backlog = 4096
#进入SYN队列最大长度, increase the queue length to accommodate more waiting connections
In a Linux system, if the process opens a file handle that exceeds the system default value of 1024, the too many files open message is prompted, so adjust the open file handle limit.
# vi/etc/security/limits.conf #加入以下配置, * on behalf of all users, can also specify users, restart the system to take effect
* Soft Nofile 65535
* Hard Nofile 65535
# Ulimit-shn 65535 #立刻生效
3.3 Hardware configuration
Increase physical memory and improve file system performance. The Linux kernel allocates the buffer (system cache and data cache) from memory to store the hot data, which is synchronized to the disk by the file system deferred write mechanism, and so on, such as when the buffer size reaches a certain percentage or when the Sync command is executed. In other words, the larger the physical memory, the larger the allocation buffer, the more cache data. Of course, a server failure can lose a certain amount of cached data.
SSD drives replace SAS hard drives, adjust RAID levels to raid1+0, and have better read and write performance (IOPS) than RAID1 and RAID5, after all, the pressure on the database comes primarily from disk I/O.
4. Database schema Extension
With the increasing volume of business, a single database server performance has been unable to meet the business needs, the consideration of adding a machine, the cluster has to do ~ ~ ~. The main idea is to decompose the load of single database, break the disk I/O performance, store the heat data in the cache, and reduce the disk I/O frequency.
4.1 Master-slave copy and read-write separation
Because the production environment, most of the database is read operation, so the deployment of a main from the architecture, the main database is responsible for the write operation, and do double-backup, many from the database to do load balancing, responsible for reading operations, the mainstream load balancer has LVS, Haproxy, Nginx.
How to achieve the separation of read and write. Most enterprises are in the code level to achieve read-write separation, high efficiency. Another way through the proxy program to achieve read and write separation, the enterprise application is less, the common agent has MySQL proxy, amoeba. In such a database cluster architecture, the high concurrency ability of database is greatly increased, and the performance bottleneck of single station is solved. If from a database can deal with QPS from the library, then 5 can handle 1w QPS, the database lateral scalability is also very easy.
Sometimes, in the face of a large number of write-operation applications, the single write performance does not meet the business requirements. If you do a dual-master, you will encounter database data inconsistencies, the reason is that different users in the application will be able to operate two databases, while the update operation caused two database database data conflict or inconsistent. In the library, MySQL uses the storage Engine mechanism table lock and row lock to ensure data integrity, how to solve this problem in multiple main libraries. A master-slave replication management tool based on Perl language development, called MYSQL-MMM (master-master replication managerfor mysql,mysql Primary Replication Manager), The greatest advantage of this tool is that it provides only one database write operation at the same time, which effectively guarantees data consistency.
Master Copy Blog: http://lizhenliang.blog.51cto.com/7876557/1290431
Read and write separation blog: http://lizhenliang.blog.51cto.com/7876557/1305083
Mysql-mmm Bowen: http://lizhenliang.blog.51cto.com/7876557/1354576
4.2 Increase Cache
To the database to increase the caching system, the hot data cache into memory, if the cache has to request data will no longer go to the database to return results, improve read performance. Cache implementations have local caching and distributed caching, where the local cache caches data into local server memory or files. The distributed cache can cache the massive data, the expansibility is good, the mainstream distributed cache system has the memcached, the redis,memcached performance is stable, the data caches in the memory, the speed is very fast, the QPS can reach 8w or so. If you want data persistence, choose to use Redis, performance is not less than memcached.
Working process:
4.3 Sub-Library
According to the different business, the related tables are divided into different databases, such as Web, BBS, blog and so on. If the volume of business is large, you can also split the library from the structure, and further avoid a single library pressure too large.
4.4-meter
The VCDs of the data volume increases, a table in the database has millions of data, causes the query and inserts to take too long, how can solve the single table pressure. You