What's the matter with MySQL replication?

Source: Internet
Author: User
Tags memcached

1, why do I need MySQL replication technology, what is the use?
The ability of a single database server is always limited. Based on the experience of professionals, a 8-core 32G configuration server that withstands 50~200 concurrent connection requests is good. MySQL replication is a way to improve your data storage-level capabilities.

2, how to improve database performance?

There are two ways to improve the performance of the database: 1, to improve the hardware configuration. 2, increase the number of servers.
For 1, the price is very low. This belongs to the vertical extension, the professional name is scale up
For 2, is the cluster technology, is a horizontal expansion, also known as scale out

Cluster technology is a technology that uses multiple servers as a single machine. The application layer is separated from the storage layer, and there is no need to consider the impact of storage layer changes on the application tier. Can the database be clustered?

3, how to expand the data storage level?

In fact, the current design of our company is to use different user's access requests to different database servers in the application layer. Application level directly to the user to find the corresponding database server IP, establish a connection to access. In general, a single user has a certain number of end devices. Each device establishes a connection with the Communications Server (the communication server can do the cluster expansion here), and the communication server interacts with the data storage tier.

We set up a database for each user, and create a table for each of the terminal devices that the user owns under this database. Offload the load at the database level.

The problem with this design:
In our design, there is a need to pre-establish a certain number of connections between the Communications Server and the database server (or change to dynamic?). )。
1, for example, there is a communication server, a MySQL server. The communication server establishes 30 connections with the database pre-server first. This is OK. As access requests grow, the general database is able to withstand the upper limit, considering increasing the number of databases and diverting the load.

2, a communications Server, multiple MySQL servers. The Communications Server establishes 30 connections to each database server. (Communication server withstand capacity > Database affordability). Access requests continue to grow, and the capacity of the communications Server is capped, considering increasing the number of communications Servers.

3, multiple communication servers, multiple MySQL servers. Each communication server establishes 30 connections to each database server.
Here's how many concurrent connections MySQL can have, performance will decline, how much? This may be a limiting condition. For example, 3 Communications Servers, each with a 3*30=90 connection. This will not be a problem.


Load test:

1, logically, a communications Server, a MySQL server. Physically, it can be on a single machine or on two machines. Test in two different situations. Observe the usage of Network load, CPU, memory, disk I/O. The maximum number of endpoints that this architecture can withstand.
2, ibid.

3, ibid.

This is an off-topic, go ahead ...

4. mysql Replication technology

4.1 The structure of MySQL replication

In fact, with MySQL replication, only the data storage layer can improve the response ability of the read request, unable to improve the response to write requests.

Structure:
1, one Master one from

Master-----> from

2, one master more from
    Master-----> from
            |--> from
           | --from
   master node network io, disk IO pressure looks larger
  
 3, Hierarchy
       Main-----> from-----> From
              |--> from      |--> from   
  The replication effort of the master node is relative to a master and many from, to light some
 
  The master node is readable and writable, and can be read only from the node.

4.2 Problems arising from multiple nodes

After the node is more, the application to read from the node, also need to do load balancing ah.
For MySQL, which has cached applications, load balancing will face a cache hit problem.

4.2.1 load balancing and buffer hit

MySQL cache, which is used to improve response speed. Because the memory accesses much faster than the disk. Use a part of it often
Data pre-placed in memory, most of the query as long as the memory to read the data, it does not need to read every time the disk files, it is too time-consuming. For example, 100 times the query operation, 90 times in memory to read data, 10 times to disk read data, the cache hit rate is 90%.

After all, the size of the memory, relative to the size of the disk or relatively small, the volume of data has been growing. It is impossible to say that all the data in the disk is cached in memory, so the hit rate is 100%. Memory space is limited after all, only a certain amount of data can be cached. If this quantitative data is stable, the change is not very big, so the hit rate can be a little higher. So a plurality of nodes, each of which has some division of labor, the respective response to a certain category of queries, you can improve this hit rate.
For this reason, the classification is based on the SQL statement and is assigned to each slave node.


4.2.1.1 How to increase cache hits:

There are several ways to categorize:
1, take the mold
For the SQL statement to do hash operation to derive the eigenvalues, with this value to the node number modulo, for example, there are three nodes, the modulus may be 0,1,2. The 0,1,2 corresponds to a node respectively. This SQL query is assigned to the corresponding node of the modulo result.


formula: SQL statement eigenvalues/nodes =0~2------> 0 Nodes
                                                                  |---> 1 node
                                                                   |---> 2 node
cons:

Once the nodes change, such as a new node, or because the node fails to delete a node, the computed "SQL statement eigenvalue/node number" Results can be changed, the SQL allocation of a wide range of changes, all nodes are already relatively stable cache, it takes some time to regain stability. The overall response performance during this period is not very stable.

2, consistent hash
For cache jitter caused by node additions and deletions, a consistent hash can limit this jitter to a smaller range.

FORMULA: SQL statement eigenvalue/(2^32) = 0 ~ 2^32-1
Node characteristic value/(2^32) = 0 ~ 2^32-1

Both the node and the SQL statements fall on a 0 ~ 2^32-1 ring.
If one node fails, the effect narrows to the SQL statement between this node and a node in the counterclockwise direction.

Disadvantage: Uneven distribution of nodes, resulting in uneven load of the nodes.

3, virtual node
The real node is virtual into multiple virtual nodes, evenly distributed on the 0 ~ 2^32-1 ring. The load on the node is more uniform.
PS: Math is short board ah, dead brain cells.

Public Cache memcached

The above is considered at the program level to improve the cache hit method.
There is another way of public caching: memcached, specifically unclear, first know that there is a way to do it.

At the application level and data storage level, the coupling degree is the best, the application layer does not need to care too much about the change of the data storage layer.
Without too much influence, this leads to the following topic: MySQL read-write separation tool.


4.3 mysql read-Write separation tool (what is it?) ):
is to add a middle layer between the application layer and the data storage layer, which can understand whether the SQL statement is read or write,
The write operation is then assigned to the master node, and the read operation is assigned to the slave node. Just like a router for an SQL statement.


4.4 What problems can MySQL replication solve?
Scaling and load Balancing
Highly Available
Data distribution
Data backup
MySQL upgrade test, it is impossible to do the main node test it.

First know that there are some concepts, specifically not very clear.

What are the drawbacks of 4.5 MySQL replication architecture?
As mentioned earlier, from the node can only read, cannot write, so this architecture has no effect on the ability to write.
Also, the primary node is a single node, and if the primary node is hung, how can I recover it? From the node down nothing, hang up just will affect the access speed.

Solution:
1, table dispersed to different nodes (vertical split)
2, when a table has a large number of accesses, the table is fragmented (split horizontally)

How to divide, according to a field, such as 1~10000 points to the node 1,10001~20000 node 2
Nature also needs to have a global manager who knows which range of data is at which node.

The more nodes involved in a write operation, the faster the write speed is because the write load is dispersed
The more nodes that are involved in the query operation, the slower the query is because it is possible to read each scattered set of data into memory and
Sort, returned the result, and compared the trouble.

You should not forget the design of the SQL statement, not too bad.

In short, MySQL replication can provide read extensions, while write extensions require a table, or Shard.

Finally, let's talk about how MySQL replication works.

4.6 How is MySQL replication implemented?
MySQL replication requires the primary node to open the binary log.

Binary Log(Record all writes, updates, deletions, etc. that cause data to change):
Recording method:
1,statment Statement Recording
Log SQL statement, problem: Now () function, etc.
2,raw Line Record
Problem: Cause record volume to increase
For example, an update operation involves 1000 rows of data, and an SQL statement can be expressed
But the record line is going to add 1000 more records.
3,mix Mixed Mode Recording
Combine the first two ways and try to avoid their own problems. It's up to MySQL to judge when
Use statements to record when to use Row Records.

how replication is performed:
The IO thread from the node goes to Port 3306 of the master node and requests the data corresponding to the binary log event to be read.
The master node starts the dump thread, reads the data from the binary log, reads a message, sends a
The IO thread from the node.
After the data is received from the node, the local relay log is written. From the node's SQL thread, the data is read from the secondary log,
Reads a bar, executes it locally. The binary logs are generally closed from the node to avoid additional overhead.

The ID number of the server that generated the event is logged in the binary log.
Within the replication schema, each node has its own ID number.

To know that replication is asynchronous because the master node's data is written, if you want to wait for the copy to complete from the node, and then return,
I'm afraid the speed of response is not very good.

Problem: Data from the node may be behind the data from the master node.

can I have multiple master nodes from a node?
No, there's only one.

(MARIADB supports multi-master replication, but does not replicate the same database, replicating different databases from different master nodes)

Dual master replication model:

Two nodes, the primary node, and the slave node.
Problem:

1, self-growth ID, an odd ID, an even number, an ID growth step of 2, can avoid ID collisions.
2, data is out of sync, same time, the same record is modified
Age wage
A 40, +1000
B <3000,-2

41-year-old, salary 2800
A 41,3800
B 39,2800

Cause data to be out of sync


The role of the dual-master model:
Highly Available
Read Operation load Balancing
Write operations do not have a balanced effect (because each write operation occurs on both sides, the difference is that it is written directly
or by copying and writing)

This article is from the "Guleva" blog, make sure to keep this source http://guli3057.blog.51cto.com/6809117/1591354

What's the matter with MySQL replication?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.