Several schemes to build MySQL cluster

Source: Internet
Author: User
Tags one table how to use sql

 

Several schemes to build MySQL cluster
Lvs+keepalived+mysql (have a brain fissure problem?) But it seems a lot of people recommend this)
Drbd+heartbeat+mysql (have a machine spare?) Heartbeat switching time is longer? Have a brain fissure problem? )
MySQL Proxy (not mature and stable?) Using LUA? Did you use him to make a table? You don't have to change the client logic? )
MySQL Cluster (Community Edition does not support InnoDB engine?) Lack of business case? )
MySQL + MHA (if equipped with asynchronous replication, seems to be a good choice, and problems?) )
MySQL + MMM (seems to reflect a lot of problems, have not practiced, who can give a statement)


Reply:

Either scenario has its own scenario limitations or scale limitations, and the pros and cons.

1. The first objection to the separation of read and write, the reasons for this explanation too many times (increase technical complexity, may lead to reading backward data, etc.), only one point: 99.8% of the business scenario is not necessary to do read and write separation, as long as the database design optimization and configuration of the appropriate host can be properly.

2.keepalived+mysql--There is a problem of brain fissure, and can not accurately determine whether the mysqld hang the situation;

3.drbd+heartbeat+mysql--also have the problem of brain fissure, still can not make accurate judge mysqld whether hang of situation, and DRDB is not need, increase but will problem;

3.MySQL Proxy--Good project, unfortunately the official halfway down, not recommended, not high availability, is a write separation;

4.MySQL Cluster--Community version does not support NDB is wrong speech, business case is not much, mainly with its business scenario requirements, the development of the past few years is a bit messy but now on the formal, high demand for the network;

5.MySQL + MHA-can solve the problem of brain fissure, the need for more IP, small cluster is possible, but the management of large on the trouble, followed by MySQL + MMM and a lot of MHA, there is no need to use MMM


Suggestions:
1. If the dual master replication mode, do not do data splitting, then you can choose MHA or Keepalive or heartbeat
2. If the two-master replication, but also to do the data splitting, you can consider the adoption of Cobar;
3. If the dual-master replication +slave, also do the data splitting, need to read and write classification, you can consider amoeba;

All of the above is based on the company's internal business scenarios, data volume, access volume, concurrency, high availability requirements, the number of DBA population and other comprehensive trade-offs, if necessary, can contact me: jinguanding#http/hotpu.cn There are many architects who prefer to use the DRBD-based architecture. If is based on the replication shared-nothing architecture, does not do the read and write separation, the multi-node writes simultaneously, certainly will conflict! Is it a clerical error?  The main problem is that the MySQL Cluster community version does not support the InnoDB engine, and not the NDB,NDB engine will of course support it. The master has a good understanding of the high availability and clustering methods of MYSQ. It should be said that according to the actual commercial scene to design the database cluster architecture, this is reasonable. If you pursue the perfect high availability, avoid any single point of failure, can be in master-slave, Master master synchronization based on the keepalived or heartbeat, both do failover are very useful. High performance, rather than expect a set of database backend services to take care of all the business, consider different business sharding on different server resources, but the complexity of its management will certainly increase, see how you weigh the management and performance aspects. With regard to extensibility, MySQL proxies (such as Ali's Amoeba) +mysql a master many from can be considered. In short there is no best plan, only the most suitable!


Jhh
Links: https://www.zhihu.com/question/21307639/answer/123316479
Source: Know
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.

introduce several scenarios first
    • Master-slave replication, including a drag-and-drop master-slave and a drag-and-drop

High availability: higher than

High scalability: None

High consistency: higher than

Latency: relatively small

Concurrency: None

Transactional: None

Throughput rate: higher than

Data loss: Not lost

Switchable: Can be switched


    • Annular replication, consisting of two nodes and a plurality of nodes formed by a ring

High availability: higher than

High scalability: None

High consistency: higher than

Latency: relatively small

Concurrency: None

Transactional: None

Throughput rate: higher than

Data loss: Not lost

Switchable: Can be switched


    • 2PC:

High availability: Very high

High scalability: Scalable, scalable, and scale-free

High consistency: higher than

Latency: Larger than

Concurrency: Relatively small

Transactional: Available

Throughput rate: relatively small

Data loss: Not lost

Switchable: Irrelevant


    • Paxos: High availability of metadata, low level of concurrency

High availability: Very high

High scalability: Irrelevant, scalable, scale-free, and scale-out

High consistency: Very high

Latency: Larger than

Concurrency: Relatively small

Transactional: Available

Throughput rate: relatively small

Data loss: Not lost

Switchable: Irrelevant



The above is purely personal understanding, if there are objections, but also no problem;

In addition, according to master whether to service specific business to divide the distribution can be divided into two categories:

    1. Master management system, and all requests through master, it is clear that Master has a performance bottleneck
    2. Master Management system, the actual request does not pass master, the request is evenly dispersed


affirmative option 2




Based on the characteristics of these schemes, how to design a cool distributed system?


Here are some of the coolest

    • Scalable to scale: requires at least hundreds of units like Hadoop.
    • High availability: Master needs to be highly available, nodes also need to be highly available, which means that one instance of any component or part of an instance is hung, without affecting the entire system
    • High concurrency: Common machine single node to support at least thousands of of the concurrency bar, if the extension is resolved, the whole system of concurrency is actually extended
    • Data consistency: Distributed systems, consistency can be difficult, as far as possible to ensure that, such as master-slave synchronization to achieve the same, or use two-phase 2pc simultaneously write multiple nodes, or use the same as Paxos consistency protocol algorithm to achieve HA
    • Transactional: Distributed systems, absolute transactions It's hard, eh, we'll use 2pc,3pc, try to make sure ha
    • Automatic switching: First you have to want to automatically switch the conditions? such as master-slave synchronization, the main hanging, I can automatically switch to from, but if the data and the main is not the same, but the business requirements are very high, do not allow this situation, it also had to stop service maintenance.

Okay, you can start spraying, how could it be.


Paxos consistency protocol, high availability, high consistency, transactional very good, then involved in a variety of services can use him, very good.

Master and metadata metadata adopt Paxos consistency protocol, all nodes also adopt Paxos consistency protocol, the client maintains this information. The schema is as follows, master, metadata, and node implement the Paxos protocol, which is accessed through the Paxos interface



Distributed database is an example, it seems that the current popular database has not supported the Paxos protocol, who can develop under. Node using Paxos words, there is a question did not think clearly, Paxos how to use SQL combination? In addition, the performance of the node is affected by a bit. Reduce the demand, the node with the primary master replication, or ring copy it. Master checks that the node is alive and does a switchover to notify the client.

The schema is as follows, master, metadata, implements the Paxos protocol, which is accessed through the Paxos interface; node is a replication relationship, when the service node hangs, it needs to be detected by master, and switch processing is done.




If it is a distributed system, such as a file system, or a system developed by itself, the node can consider using the Paxos protocol. Each node takes 3 instances, or you have resources, using 5 instances.



SQL implementation of distributed database

is also a difficult point, that is, a complex SQL, how to achieve?


? Using the idea of sub-library to realize data storage
? Using the idea of mapred to implement SQL computing


? will input SQL through lexical, grammar, semantic analysis, aggregate table structure information and data distribution information, to generate a number of stages (short stage) of the execution plan, these stages have a certain dependency on the formation of a multi-input single output task tree;
? Each phase consists of two SQL, called Mapsql and Redsql, each of which includes three operations, map, data shuffling and red;map and red respectively performing mapsql and Redsql;


The processing logic and processing order of the clauses:


1.union: Decompose each clause, parse it separately, form a parallel relationship
2.from: Select a table, you can select more than one table, but also the case of join
If a join is included in the 3.join:from, consider the various issues of join
4.where: Single-table, multiple-table, where filter conditions after join
5.group: Grouping
6.select: The selected column
7.distinct: Remove Duplicate rows
8.having: Filtered after aggregation
9.order: Sort the results
10.limit offset: Get some records of the final result
11. Sub-query: Encounter sub-query independent parsing, with the upper level to establish a dependency relationship

Connection, including inner connection, left connection, right connection, half connection, outer connection

Take the following SQL as an example:

All login information for a user within a registered time range

Selectt1.u_id,t1.u_name,t2.login_product

From Tab_user_info T1 jointab_login_info T2

On (t1.u_id=t2.u_id andt1.u_reg_dt>=? and t1.u_reg_dt<=?)


The resulting execution plan is:

Because it is a join, all tables are queried, and each table is labeled with its own label, and when implemented, you can add a table name field and execute it on all storage nodes.

Select U_id,u_name from Tab_user_info t whereu_reg_dt>=? and t1.u_reg_dt<=?

Select u_id, login_product from Tab_login_info t

After execution, in this case due to the need for data shuffling according to U_ID, considering the unique value of u_id more, so each storage node needs to be divided according to u_id,

For example, if there are n compute nodes, divide the u_id of the same range on different storage nodes into the same compute node according to the (Maximum u_id-minimum u_id)/N Division


Then perform the following actions on the compute nodes

Selectt1.u_id,t1.u_name,t2.login_product

From Tab_user_info T1 jointab_login_info T2

On (t1.u_id=t2.u_id)


There are a lot of unfinished questions about how distributed SQL can be implemented. Interested can discuss with each other. Welcome to the

A few additions:

1. For situations where strong consistency is required, at least before MySQL 5.7, DRBD is meaningful. 5.7 is said to be able to achieve true synchronous replication, if it can be achieved, no longer need to DRBD.

2. The problem of brain fissures in network partitioning must be avoided by using a majority-based electoral algorithm to elect Master. Many schemes, such as using ZooKeeper, ETCD, Consul and other services for election, the selection of Master.

3.MHA did not know in depth, but the impression of its Master (arbiter) node seems to have a single point of problem? I'm not mistaken. This node is used to complete the main node of MySQL election work, it itself is not HA or hidden.

MySQL large distributed cluster 1, the main solution for large web site in the persistence section of the architecture, a large number of data storage and high concurrent access is the data read and write problems. Distributed is the splitting of a business into multiple sub-services, deployed on different servers. A cluster is the same business that is deployed on multiple servers.

2, focusing on data segmentation to do a detailed and rich explanation, from the principle of data segmentation, step by step in-depth understanding of data segmentation, through in-depth understanding of the various segmentation strategies to design and optimize our system. In this part we also use the database middleware and client components to the data segmentation, so that the vast number of users can be divided into the data from the theory to combat will have a qualitative leap.



No one mentioned Atlas
Atlas is a MySQL protocol-based data mid-tier project developed and maintained by Qihoo, a web platform infrastructure team. It was optimized on the basis of the Mysql-proxy 0.8.2 version, adding some new functional features. 360 internal use of the MySQL operation in Atlas, the number of read and write requests per day up to billions of.

Several schemes to build MySQL cluster

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.