Design and Implementation of MySQL Cluster System

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Whether your webapp system is using a MySQL DatabaseSystem? Does your customer always complain that the page results are very slow? Is the load of your MySQL system always in a very high state? This article will provide you with a method to share the load of the MySQL system, and a derived mysql-ha-proxy development project. Using the methods provided in this article, you will obtain the efficient operation of the MySQL system with minimal source code changes.

Section 1DatabaseClusterTechnology Status quo

CurrentlyDatabaseClusterThe system has been used successfully and has a wide range of applications: Oracle's oracle9 and IBM's DB2. Oracle9 adopts the shared-storage technology. DB2 chooses the shared-nothing technology.

LatestDatabaseClusterThe theoretical basis of the system is distributed computing, which distributes data to each node. All computing nodes process data in parallel and summarize the results. This method is undoubtedly the most perfect. However, you still cannot implement all functions.

For the technologies of shared-storage and shared-nothing, refer to relevant information on the Oracle and IBM websites.

Section 2 currentlyDatabaseApplication Status

CurrentlyDatabaseThe application status is roughly divided into two categories. The first category is that the data volume is less than GB,DatabaseFrequent access and intensive requests. Applications of the web app type, such as websites and forums. Access to these web app applicationsDatabaseFeatures: frequent access,DatabaseMore than several thousand queries are accepted per second. data needs to be appended frequently, and the response speed of the data is relatively high. The other type is used for scientific computing and storage of historical data. The data volume is usually several hundred GB. Access these applicationsDatabaseMost of them are query operations. Data is batch, scheduled, and concentrated.Database,DatabaseThere are many records, and a large amount of data is accumulated.DatabaseThe response speed is not too high.

Problems exposed in Section 3

The first type of applications generally use the Server Load balancerCluster, ButDatabaseBecause it cannot be implementedClusterThe number of requests per second increases. As the server load increases, the speed of responding to a single request slows down. If the library file is large, when a write operation occurs, the lock table takes a long time and affects the access efficiency.

The second type of applications is mainly because the data file is too large and each time data processing takes a lot of time. If an incorrect statement is written, it takes several hours to redo the query.

Section 4 how to solve

First, we should optimize the hardware, software, programs, indexes, and SQL statements. If the problem still cannot be solved, we should considerDatabaseSystemCluster(Parallel processing.

For the first type of applicationsDatabaseWhen the server runs normally and the load is lowDatabaseThe system status is satisfactory. HoweverDatabaseIf the system load is too high, the request Completion Time will be extended, which cannot meet the requirements of the system. Since the load is caused by too many requests, we will share the request to allow some requests to access another server and reduce the load of a single server to solve the problem.

For the second type of applications, distributed computing systems are needed to solve the problem. General systems are powerless.

Section 5 Solutions to the first type of application problems in "Linux + Apache + PHP + MySQL"

Solution to one actual case:
I encountered this problem during my work. Our Web server is composed of three Linux + Apache + PhP machines.ClusterMySQL runs on sun450 and 2 GB memory platforms. Because the web traffic is almost full during peak hours, loadavg (the number of processes in the running state within one minute) is between 10 and 20, reflecting that a large number of requests are accessingDatabaseAs a result, a request is incomplete, the next request comes in again, and the last vicious circle occurs. Loadavg will soar to more than 800 in an instant.DatabaseThat's even worse. loadavg is more than 300,DatabaseThere are many threads, and the CPU is busy switching the thread status. At this time, it won't be good unless the restart MySQL. The problem still cannot be solved after the SQL statement optimization is completed. We have addedDatabaseServer, through the MySQL data synchronization mechanism, let the twoDatabaseThe data on is synchronized. modified some PHP programs that only read data and connected them to another one.DatabaseThe problem was initially solved by separating the load. However, when the business grew bigger, we added multiple servers, modified a lot of programs, and separated them fromDatabaseTo access different servers.

Section 6 Proposal of the mysql-ha-proxy Solution

It is very painful to separate the system load by modifying the program. The project is huge and cannot be mistaken, because besides the master server, data can be written or modified, other servers can only synchronously update their own data.DatabaseIf a write operation is performed, the result is disastrous.

If we can have a program to sort the SQL statement, it is sent to different servers based on its type (read/write), and then the result is returned. Using a proxy similar to HTTP, we do not need to modify the source program to share the load, or the table status (available/locked) to determine which server the request should be allocated, which is better than what we can achieve by modifying the source program.

Section 7 Communication Between MYSQL client and server

Looking around, I did not find an article about the MySQL communication protocol. It seems that only the source program of MySQL is analyzed. So I found the MySQL 3.23.49 code and opened the sniffer tool. MySQL communication protocols may have changed many times. In version 3.23.49, the communication protocol version is 10.

After a simple analysis of the communication protocols, the rules are as follows, and some of them are not perfect. As I really don't have much time to carefully study the MySQL code, I only know this.

The response data format of the server to the client request:
Description of the Offset region type length (byte)
0 head data length 3
1
2
3 flag 1 = 0 common information
= 1 multi-segment information
= 2 authentication return
> 2 ending words
4 Data cmd code 1
5 message datalength-1

Definition of CMD code and message when flag is 0 or 2
Structure of the response code message
00 status code offset type length (byte)
0 affect rows 2
0a server version offset type length (byte)
Valid only when the server is connected. The server immediately returns the information of a Data Segment 0 versionstring 8 end of '/0'
8 session ID 4 32 bits
12 unknown 11

FF if an error occurs, return the message offset type length (byte)
0 errcode 2
2 errmsg end
The end of multi-segment Fe information transmission is empty.

The format in which the client submits data to the server:
Offset region type length (byte)
0 head data length 3
1
2
3 compressed 1
4 Data command ID 1
5 command Data Length-1

Description of command ID and command data:
ID data format
0 com_sleep
1 com_quit null
2 com_init_db Database Name
3 com_query stand query string
4 com_field_list table name [128] wildcard [128]
5 com_create_db Database Name
6 com_drop_db Database Name
7 com_refresh options (BITs)
8 com_shutdown null
9 com_statistics null
10 com_process_info null
11 com_connect
12 com_process_kill Sid [4]
13 com_debug null
14 com_ping null
15 com_time
16 com_delayed_insert
17 COM_CHANGE_USER [user] [passwd] [dB]
18 com_binlog_dump
19 COM_TABLE_DUMP
20 com_connect_out

Section 8 how the client passes Server user authentication

After the protocol analysis is complete, I try to make it work, but authentication is troublesome. when MySQL server connects to the client, it will first return a packet to the client, it contains the Protocol version number, version information, sessionid, and an 8-byte key. The client uses this keyEncryptionPassword, And then set the user name,Password,DatabaseAnd so on. I don't know how the client uses this keyEncryptionSo I plan to skipPassword, I reorganized the client data packet and removed the password information.ClusterNone of the mysql users in itPasswordThere are more or less security issues, but these servers are placed behind the HA, and there is no external IP address, it should not be a problem, but more or less is a defect.

But I always need to know the user'sPasswordIs it correct? What should we do? Use a dedicated MySQLPasswordAuthentication. Install a MySQL server with minimal resources for mysqlauth (dedicated authentication server). When the client connects, the first packet of mysqlauth is returned to the client, which includes the key, then the client uses this key,EncryptionPasswordThen, send the authentication information back. At this time, the mysqlha system will forward this information to mysqlauth and keep a copy of it. If the authentication is successful, the retained copy is reorganized and removed.PasswordInformation, and then connect with the restructured authentication informationCluster.

Section 9 system structure and process

In the figure, HA is a high-reliability system built using HEARTBEAT (see http://www.linuxvirtualserver.org/for specific implementation methods /). Proxy is a MySQL-proxy system, and mysqlauth is a dedicated authentication server. The red RealServer is the main server, which can update data and synchronize data to other realservers.

Describes the client authentication process.

Describes the process of establishing a connection with the RealServer after the authentication fails and the authentication is passed.

Describes how the system processes SQL query requests after a connection is established.

Conclusion

I have basically completed mysql-proxy program development, but it is still in the test phase. The latest version is 0.0.4, And the next version is still being revised. Starting from version 0.0.3, MySQL-proxy has been able to completely run the SQL-scripts provided by Mysql itself, but this SQL-scripts can only provide single point of performance, noClusterMySQL system provides the test function.

The system provides a program for dynamically collecting loadavg on the RealServer and then reporting it to the MySQL proxy. However, I did not test this part, therefore, the request allocation method I used in the previous test is polling. If two RealServer systems with the same load are automatically rotated between them.

Mysql-proxy source code you can download to my site: http://netsock.org/bbs/ mysql-ha-cluster project. I will also publish some testing data there.

How to perform a system test?

Since it is specially designed for systems such as Linux + Apache + PHP + MySqlCluster, You should find a practical application to run and view, and then simulate a large number of accesses for testing.

It may be good to choose a forum system. vBB is widely used and popular. Simulate access using the AB provided by Apache.

The minimum environment of the test system is: (five machines)

1 x Apache + PHP
1 x AB
1 x MySQL proxy + MySQL Auth Server
2 x MySQL Real Server

References:

The slides in Section 9 can be obtained in the http://www.netsock.org/mysqlha/mysql-ha.ppt
The latest source code can be obtained in the http://www.netsock.org/mysqlha/mysql-proxy_0.0.4.zip
Installation instructions can refer to the http://netsock.org/bbs/showthread.php? Threadid = 5
The running result of an SQL-statement can be in the http://netsock.org/bbs/showthread.php? Threadid = 9.

Author profile:

Xu Chao, working in Tom. com Beijing, is engaged in network system technical support and system maintenance. I am dedicated to the development of network applications based on netsocket Technology in my spare time. Development Site: http://netsock.org/bbs/ is currently developing projects including: socketchat, MySQL-ha-proxy, PHP session Server

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More