Implementation of MySQL-based database cluster system (1)

Source: Internet
Author: User
Tags ibm db2 mysql code

Section 1 Status Quo of Database Cluster Technology

Currently, the database cluster system has been successfully applied and has a wide range of applications: Oracle Oracle9 and IBM DB2. Oracle9 adopts the Shared-storage technology. DB2 chooses the Shared-nothing technology.

The theoretical basis of the latest database cluster system is distributed computing, which distributes data to each node. All computing nodes process data in parallel and summarize the results. This method is undoubtedly the most perfect. However, you still cannot implement all functions.


For the technologies of Shared-storage and Shared-nothing, refer to relevant information on the Oracle and IBM websites.

Section 2 Current Database Application Status

Currently, the database application status is roughly divided into two categories. The first category is that the data volume is less than GB, the database is frequently accessed, and requests are intensive. Applications of the Web APP type, such as websites and forums. These Web APP applications access the database frequently. The database needs to frequently append data to thousands of queries per second, at the same time, the data response speed requirement is relatively high. The other type is used for scientific computing and storage of historical data. The data volume is usually several hundred GB. These applications access the database with the following features: Mostly query operations, data is batch, scheduled, and centralized import into the database. The database has many records and has accumulated a large amount of data, there is no high requirement on the database response speed.

Problems exposed in Section 3

The first type of application generally uses a Server Load balancer cluster to support more accesses because of frequent access requests. However, for databases, cluster operations cannot be implemented, the number of requests per second increases. As the server load increases, the speed of responding to a single request slows down. If the library file is large, when a write operation occurs, the lock table takes a long time and affects the access efficiency.

The second type of applications is mainly because the data file is too large and each time data processing takes a lot of time. If an incorrect statement is written, it takes several hours to redo the query.

Section 4 how to solve

First, we should optimize the hardware, software, programs, indexes, and SQL statements. If the problem still cannot be solved, we should consider the parallel processing of the database system cluster.

For the first type of applications, when the database server runs normally and the load is low, the applications are satisfied with the database system status. However, when the database system load is too high, it will take longer to complete the request, which does not meet the requirements of the system. Since the load is caused by too many requests, we will share the request to allow some requests to access another server and reduce the load of a single server to solve the problem.

For the second type of applications, distributed computing systems are needed to solve the problem. General systems are powerless.

Section 5 Solutions to the first type of application problems in "Linux + Apache + PHP + MySQL"

Solution to one actual case:

I encountered this problem during my work. Our Web Server is a cluster composed of three Linux + Apache + Php machines. MySQL runs on the SUN450, 2 GB memory platform. Because the WEB traffic is almost full during peak hours, LoadAvg (that is, the number of processes in the Running state within one minute) is between 10 and 20, it is reflected that a large number of requests are suspended when accessing the database, resulting in a failed request, the next request comes in again, and the last vicious circle. LoadAvg will soar to more than 800 in an instant. The database is even worse. The LoadAvg has more than 300 threads, and the database has many threads. the CPU is busy switching the thread status. At this time, it will not be good unless the Restart MySQL. After the SQL statement optimization is completed, the problem still cannot be solved. We have added a database server to maintain data synchronization between the two databases through the MySQL data synchronization mechanism, modified some php programs that only read operations will occur. connecting these programs to another database can be considered as splitting the load. The problem was initially solved. However, as the business grew bigger, we added multiple servers, modified many programs, separated their database read operations, and accessed different servers.

Section 6 Proposal of the MySQL-HA-Proxy Solution

It is very painful to separate the system load by modifying the program. The project is huge and cannot be mistaken, because besides the master server, data can be written or modified, other servers can only synchronously update their own data. Therefore, if you perform write operations on those databases, the results will be disastrous.

If we can have a program to sort SQL statements and read/write them based on their types), send them to different servers and then return the results. Using a PROXY similar to HTTP, we do not need to modify the source program to share the load, or the table status can be used/locked) to determine which server the request should be allocated, which is better than modifying the source program.

Section 7 Communication Between MySQL Client and Server

Looking around, I did not find an article about the Mysql communication protocol. It seems that only the source program of Mysql is analyzed. So I found the mysql 3.23.49 code and opened the sniffer tool. MySQL communication protocols may have changed many times. In version 3.23.49, the communication protocol version is 10.

After a simple analysis of the communication protocols, the rules are as follows, and some of them are not perfect. As I really don't have much time to carefully study the mysql code, I only know this.

The response data format of the Server to the Client request:

Offset Region Type Length (byte) Description
0 HEAD Data Length 3  
1  
2  
3   FLAG 1 = 0 common information
= 1 multi-segment information
= 2 authentication return
> 2 ending words
4 DATA CMD Code 1  
5   Message DataLength-1  

Definition of CMD Code and Message when FLAG is 0 or 2

Response Code Type Message structure
00 Status Code Offset Type Length (byte)  
    0 Affect rows 2  
0A Server version number Offset Type Length (byte)  
  Valid only when you connect to the Server. The Server immediately returns the information of a data segment. 0 VersionString 8 End of '\ 0'
  8 Session ID 4 32 bits
  12 UnKnown 11  
         
FF Returned information when an error occurs Offset Type Length (byte)  
    0 ErrCode 2  
    2 ErrMsg END  
FE End of multipart Information Transmission Null


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.