Implementation of MySQL-based database cluster system

Implementation of MySQL-based database cluster system _ MySQL

Last Update:2018-04-21 Source: Internet

Author: User

Tags ibm db2 mysql code

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Is your WebApp system using a MySQL database system? Does your customer always complain that the page results are very slow? Is the load of your MySQL system always in a very high state? This article will provide you with a method to share the load of the MySQL system, and a derived MySQL-HA-Proxy development project. Using the methods provided in this article, you will obtain the efficient operation of the MySQL system with minimal source code changes.
　　
　　 Section 1 status quo of database cluster technology
Currently, the database cluster system has been successfully applied and has a wide range of applications: Oracle Oracle9 and IBM DB2. Oracle9 adopts the Shared-storage technology. DB2 chooses the Shared-nothing technology.
　　
The theoretical basis of the latest database cluster system is distributed computing, which distributes data to each node. all computing nodes process data in parallel and summarize the results. This method is undoubtedly the most perfect. However, you still cannot implement all functions.
　　
For the technologies of Shared-storage and Shared-nothing, refer to relevant information on the Oracle and IBM websites.
　　
　　 Section 2 current database application status
Currently, the database application status is roughly divided into two categories. The first category is that the data volume is less than GB, the database is frequently accessed, and requests are intensive. Applications of the Web APP type, such as websites and forums. These Web APP applications access the database frequently. the database needs to frequently append data to thousands of queries per second, at the same time, the data response speed requirement is relatively high. The other type is used for scientific computing and storage of historical data. the data volume is usually several hundred GB. These applications access the database with the following features: mostly query operations, data is batch, scheduled, and centralized import into the database. the database has many records and has accumulated a large amount of data, there is no high requirement on the database response speed.
　　
　　 Problems exposed in section 3
The first type of application generally uses a Server load balancer cluster to support more accesses because of frequent access requests. However, for databases, Cluster operations cannot be implemented, the number of requests per second increases. as the server load increases, the speed of responding to a single request slows down. if the library file is large, when a write operation occurs, the lock table takes a long time and affects the access efficiency.
　　
The second type of applications is mainly because the data file is too large and each time data processing takes a lot of time. if an incorrect statement is written, it takes several hours to redo the query.
　　
　　 Section 4 How to solve
First, we should optimize the hardware, software, programs, indexes, and SQL statements. if the problem still cannot be solved, we should consider the database system cluster (parallel processing.
　　
For the first type of applications, when the database server runs normally and the load is low, the applications are satisfied with the database system status. However, when the database system load is too high, it will take longer to complete the request, which does not meet the requirements of the system. Since the load is caused by too many requests, we will share the request to allow some requests to access another server and reduce the load of a single server to solve the problem.
　　
For the second type of applications, distributed computing systems are needed to solve the problem. general systems are powerless.
　　
　　 Section 5 solutions to the first type of application problems in "Linux + Apache + PHP + MySQL"
Solution to one actual case:
　　
I encountered this problem during my work. our Web Server is a cluster composed of three Linux + Apache + Php machines. MySQL runs on the SUN450, 2 GB memory platform. Because the WEB traffic is almost full during peak hours, LoadAvg (that is, the number of processes in the Running state within one minute) is between 10 and 20, it is reflected that a large number of requests are suspended when accessing the database, resulting in a failed request, the next request comes in again, and the last vicious circle. LoadAvg will soar to more than 800 in an instant. The database is even worse. the LoadAvg has more than 300 threads, and the database has many threads. the CPU is busy switching the thread status. at this time, it will not be good unless the Restart MySQL. After the SQL statement optimization is completed, the problem still cannot be solved. we have added a database server to maintain data synchronization between the two databases through the MySQL data synchronization mechanism, modified some php programs that only read operations will occur. connecting these programs to another database can be considered as splitting the load. The problem was initially solved. However, as the business grew bigger, we added multiple servers, modified many programs, separated their database read operations, and accessed different servers.
　　
　　 Section 6 proposal of the MySQL-HA-Proxy solution
It is very painful to separate the system load by modifying the program. The project is huge and cannot be mistaken, because besides the master server, data can be written or modified, other servers can only synchronously update their own data. Therefore, if you perform write operations on those databases, the results will be disastrous.
　　
If we can have a program to sort the SQL statement, it is sent to different servers based on its type (read/write), and then the result is returned. Using a PROXY similar to HTTP, we do not need to modify the source program to share the load, or the table status (available/locked) to determine which server the request should be allocated, which is better than what we can achieve by modifying the source program.
　　
　　 Section 7 communication between MySQL Client and Server
Looking around, I did not find an article about the Mysql communication protocol. it seems that only the source program of Mysql is analyzed. So I found the mysql 3.23.49 code and opened the sniffer tool. MySQL communication protocols may have changed many times. in version 3.23.49, the communication protocol version is 10.
　　
After a simple analysis of the communication protocols, the rules are as follows, and some of them are not perfect. as I really don't have much time to carefully study the mysql code, I only know this.
　　
The response data format of the Server to the Client request:
　　　
Definition of CMD Code and Message when FLAG is 0 or 2
　　　
The format in which the Client submits data to the Server:
　　　
Description of Command ID and Command Data:
　　　
　　 Section 8 how the Client passes Server user authentication
After the protocol analysis is complete, I try to make it work, but authentication is troublesome. when Mysql Server connects to the Client, it will first return a packet to the Client, it contains the protocol version number, version information, SessionID, and an 8-byte Key. The Client uses this Key to encrypt the password, and then sends the username, password, database to be opened, and other information to the Server. this completes authentication. I don't know how the Client uses this Key for encryption, so I plan to skip the Password. After I reorganize the Client data packet and remove the Password information, I succeeded, however, Mysql users in the cluster do not have a password, and the security is more or less problematic. However, these servers are all placed behind the HA, and there is no external IP address, so it should not be a problem, but it is more or less a defect.
　　
But I always need to know whether the user's password is correct? What should we do? Use a dedicated Mysql instance to complete password authentication. Install a Mysql Server with minimal resources for MysqlAuth (dedicated authentication Server). when the Client connects, the first packet of MysqlAuth is returned to the Client, which includes the Key, then the Client will use this Key, encrypt the password, and send the authentication information back. at this time, the MysqlHA system will forward this information to MysqlAuth and keep one copy of it, if the authentication is successful, reorganize the retained copy, remove the password information, and then use the restructured authentication information to connect to the server in the cluster.
　　
　　 Section 9 system structure and process
　　　
In the figure, HA is a high-reliability system built using HeartBeat (see http://www.linuxvirtualserver.org/for specific implementation methods /). Proxy is a Mysql-Proxy system, and MysqlAuth is a dedicated authentication server. The red RealServer is the main server, which can update data and synchronize data to other realservers.
　　　
　　　
Describes the Client authentication process.
　　
Describes the process of establishing a connection with the RealServer after the authentication fails and the authentication is passed.
　　
Describes how the system processes SQL Query requests after a connection is established.
　　
　　 Conclusion
I have basically completed mysql-proxy program development, but it is still in the test phase. The latest version is 0.0.4, and the next version is still being revised. Starting from version 0.0.3, mysql-proxy has been able to completely run the SQL-scripts provided by mysql itself, but this SQL-scripts can only provide single point of performance, the mysql system of the cluster is not tested.
　　
The system provides a program for dynamically collecting LoadAvg on the RealServer and then reporting it to the Mysql Proxy. However, I did not test this part, therefore, the request allocation method I used in the previous test is polling. if two RealServer systems with the same load are automatically rotated between them.
　　
Mysql-proxy source code you can download to my site: http://netsock.org/bbs/ Mysql-HA-Cluster project. I will also publish some testing data there.
　　
　　 How to perform a system test?
Since it is a cluster dedicated to systems such as Linux + Apache + Php + Mysql, we should find a practical application to run and view it, and then simulate a large number of accesses for testing.
　　
It may be good to choose a forum system. VBB is widely used and popular. Simulate access using the AB provided by Apache.
　　
The minimum environment of the test system is: (five machines)
　　
1 x Apache + PHP
1 x AB
1 x Mysql Proxy + Mysql Auth Server
2 x Mysql Real Server

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More