MySQL Architecture Scheme

Source: Internet
Author: User
Scale Out: horizontal scaling, adding processing nodes to improve overall processing capability
Scale Up: Scale Up to improve the overall processing capability of a single node.

Replication

MySQL replication is asynchronous and suitable for scenarios where real-time data requirements are not particularly critical. The slave I/O thread is responsible for reading logs from the master, and the SQL thread is responsible for reading logs from the master on the slave side (MySQL was implemented using a thread in the early stage, and the performance problem was obvious ). Binary log must be enabled when using replication, and MySQL uses binary log to distribute updates to slave.

Replication level
1. Row Level: 5.1.5 is supported. Mater records the change logs of each row of data, and slave applies the logs row by row. Advantage: Data Consistency is more secure. Disadvantage: the log file may be large.
2. Statement Level: the master node records each executed query Statement and some context information. The slave node re-executes the Statement on the slave based on the information. Advantage: binary log is relatively small. Disadvantage: Data Consistency is hard to guarantee in some cases
3. Mixed Level: MySQL selects the replication mode based on the actual situation. 5.1.8 start to support

Common architecture
1. Master-Slaves: This method is usually used
2. Dual Master (Master-Master): The two master nodes are updated synchronously. Because of the asynchronous replication method of MySQL, to prevent data inconsistency caused by data conflicts, only one of them is used for write operations, and the other is not used for read operations. The purpose is to enable another master to be used when one of the master nodes is shut down for maintenance or when a fault is interrupted.
3. master-Slaves: In the Master server Load balancer, if the server Load balancer is too large for replication, the master can distribute updates to only several Server Load balancer instances, these slave servers serve as the first-level node and then distribute the update logs to the lower-level nodes.

Cluster

It is implemented mainly through the NDB Cluster Storage engine and is a Share Nothing architecture. Each MySQL Server does not need to Share any data. Redundancy (reliability) and load balancing can be achieved. At the beginning, MySQL needed to load all the data and indexes to the memory before using the Cluster, which had high requirements on memory. Currently, only indexes could be loaded to the memory through improvement.


MySQL rrer architecture, which is shown in the official MySQL documentation.
1. SQL Nodes: responsible for matters outside the storage layer, such as connection management, query processing and optimization, cache management, etc., that is, the MySQL server on the storage layer is stripped
2. Data Nodes: The NDB node of the storage layer, that is, the storage engine in the Cluster environment. Each cluster node stores a shard of complete data, depending on the number and configuration of nodes. Generally, data nodes are organized into groups, each group has an identical copy of physical data (redundancy ). The NDB storage engine uses storage nodes according to the redundant parameter configuration, and then segments the data according to the number of nodes.
3. management Server: responsible for the Management of each node in the entire clsuter cluster. It stores the configuration of the entire cluster environment, starts and closes each node, and implements routine maintenance and Backup recovery for each node, it obtains the status and error information of each node and returns it to all nodes in the cluster.

Cache
1. Application Layer cache: The most common cache method. The application layer manages the cache on its own, regardless of the database.
There are many application-layer cache options
A). Memcached: the distributed cache has excellent performance and good scalability. Although it is slower than the local cache solution, it has the shared cache effect.
B ). berkeley DB: Compared with Memcached, BDB implements local cache. Memcached is a memory-based Cache, while BDB uses disks at different costs. Memcached only supports hash indexes, BDB supports hash indexes and B-Tree indexes. the MySQL database solution uses BDB, mainly for scalability considerations, and the BDB hash index can be more effective than MySQL.

2. MySQL-Memcached Integration
: Two methods are available.
A) directly use Memcached as the MySQL second-level cache. This increases the cache capacity of MySQL, while Memcached is invisible to applications. This solution applies to some specific scenarios, such as the difficulty of Data splitting and transformation of applications.
This solution currently has an open source Waffle Grid project that only supports the InnoDB Storage engine. The principle is that MySQL reads data from the disk when the data cannot be found in the Local Buffer Pool, while Waffle Grid adds a processing section here to try to read data from the Memcached Server first, if the Memcached Server does not exist, it reads data through disk IO. The read data is stored in Memcached and recorded in the LRU List of InnoDB Buffer Pool, if the data is changed to dirty or the Cache Management clears the data, InnoDB moves the corresponding LRU List to the FLUSH List. In this case, Waffle Grid deletes the data from the Mamcached Server. From the above, we can see that the cache mechanism is still managed by MySQL. Memcached stores only valid (not dirty data) data and only data for cache purposes, memcached Server crashes and stops without affecting MySQL Data.
Architecture (MySQL performance tuning and Architecture Design):

Does the Waffle Grid suffer from poor query performance due to network access overhead? You can refer to the test results of Waffle Grid DBT2 and evaluate the actual test.

B) Use the UDF function of MySQL to integrate with Memcached.. Memcached data is jointly maintained by MySQL and the application. The application first reads data from Memcached, and then reads data from the MySQL Server when the data is not read, and writes the data to Memcached, when Memcached data becomes invalid due to data update or deletion, MySQL clears the corresponding data in Memcached.
Architecture (MySQL performance tuning and Architecture Design):

For how to download and use MySQL Memcached UDFs, see Using the MySQL memcached UDFs.

Data splitting and Sharding

Vertical splitting and horizontal splitting Include splitting a table into multiple tables based on different fields and creating different tables in different MySQL servers, horizontal splitting refers to splitting data from the same table to multiple tables for storage.
1. Table Structure Design Vertical Split. Common scenarios include:
A) vertical segmentation of large fields. To improve the access performance of the basic table, you should avoid the large fields of the database in the key performance applications.
B). Vertical Split Based on usage. For example, enterprise material attributes can be vertically split according to basic attributes, sales attributes, procurement attributes, production and manufacturing attributes, Financial and Accounting attributes, and other purposes.
C). Split vertically according to the Access frequency. For example, in e-commerce and Web 2.0 systems, if many user attributes are set, you can vertically split basic and frequently used attributes from uncommon ones.

2. Sharding

Sharding refers to a vertical cutting in the form of "shared nothing", where the cut part is deployed on different servers. For the differences between sharding and partition, see DBA Notes Feng dahui's open-source database Sharding Technology (Share Nothing)


Picture from DBA Notes: http://www.dbanotes.com
The Sharding scheme can be considered as follows:
A). Session-based sharding: you only need to determine the processing node when creating the session. Subsequent requests are directed to the node for processing.
B). Statement-based sharding. Each Statement must determine the processing node.
C). Transaction-based sharding. Determine the processing node based on the first statement in the Transaction.
Potential Sharding problems:
A ). database-level join operations cannot be used between Split parts (cross-shard joins can be used to deploy public and Global tables on each node and distribute them using the replication mechanism)
B). Complicated transaction processing between Split parts
C). Management of auto-growth keys (mainly in the case of Sharding mixed with horizontal splitting)

3. horizontal segmentation
Example:
A). For example, for online e-commerce websites, the Order table contains a large amount of data, which is split by year and month.
B). There are too many registered users and active users on the Web 2.0 website. Based on the user ID range and other methods, the user and the table closely associated with the user are horizontally split.
C ). for example, if a top post on a Forum involves paging issues, each page must be displayed with a top post. In this case, you can split the top post horizontally to avoid reading from the table of all posts when the top post is retrieved.

One of the difficulties is that the complexity of logic and correlation hinders horizontal segmentation. In such scenarios, it is difficult to determine the scope and strategy of splitting. For large ERP systems such as SAP, there are many modules and tables, and the logic between them is complicated. SAP is based on each Client (company or group) completely split the entire business data, which is very difficult if the granularity needs to be refined. The solutions are generally as follows: a). Split by primary key; B). Maintain a master cut index table. This solution is highly scalable, but you need to find the primary index table.
The second is how to make horizontal segmentation scalable. Take the Web 2.0 website as an example. If you split the website according to the member ID range, if you decide to split the website horizontally into five parts, if you use the value of user_id % 5 to determine which part of the user belongs, in this way, as the number of users increases in the future, it will be quite troublesome to split it into 20 more in the future.
In theory, linear scaling can be achieved in the hybrid mode of horizontal splitting and Sharding, but it is limited by the application status, design, and implementation scheme of splitting and Sharding.

Splitting and integration solutions
There are two main implementation methods:
1. configure and manage one or more data sources required by each application module to directly access each database.
2. Use the intermediate proxy layer to manage all data sources. The backend database cluster architecture is transparent to front-end applications.

Use MySQL Proxy:
This is officially provided by MySQL. It is located between the client program and MySQL Server and can monitor, analyze, and convert communication between them. Common scenarios include Server Load balancer, fault recovery, query analysis, query filtering and modification, and basic HA mechanisms. At present, there are still some problems in implementing read/write splitting, architecture (MySQL performance tuning and Architecture Design):

MySQL Proxy provides a basic framework. Other functions need to be implemented by writing LUA scripts (sacrificing certain performance and bringing flexibility)
HSCALE is a LUA module of MySQL Proxy. It transparently analyzes and overwrites queries, and transfers the splitting and partition logic from the application layer to the Proxy layer.
HSCALE project official site: http://www.hscale.org, author blog site: http://pero.blogs.aprilmayjune.org/

Use Amoeba:
Amoeba is an open-source framework for Java-based integration of Distributed Database Data sources into the Proxy program.
It mainly solves the following problems:
A). Integrate complex data sources after data splitting
B) Provide data splitting rules and reduce the impact of data splitting rules on the database
C). Reduce the connection between the database and the client
D). read/write splitting route
Amoeba For MySQL is a solution dedicated to MySQL databases. The architecture is as follows:

Picture from MySQL performance tuning and Architecture Design
Amoeba For Aiadin is a more common solution. It receives MySQL Protocol requests at the front end and can use MySQL, Oracle, PostGreSql, and other data sources at the backend. These are transparent to applications. Architecture:

Picture from MySQL performance tuning and Architecture Design

Use HiveDB
HiveDB is also an open-source Java-based framework supported by commercial companies. Currently, it only supports horizontal splitting and data redundancy and basic HA mechanisms. He uses Hibernate Shards to achieve horizontal data segmentation and implement data redundancy mechanisms on his own. In HiveDB, data is distributed to multiple MySQL servers through various custom Partition keys. query requests are parsed during access, filtering conditions are automatically analyzed, and data is read from multiple MySQL servers in parallel, the merged result set is returned to the client application. Architecture:

Picture from HiveDB Official Website: http://www.hivedb.org/
Hibernate Shards is contributed by Google. It uses the standard Hibernate programming model and uses Hibernate Shards. It supports scalable sharding policies and virtual shards to simplify resharding processing.

Spock Proxy
An open-source project generated by the actual project (Spock is a Rails application, and Speck Proxy should be used outside of Rails, such as PHP or. is a branch of MySQL Proxy. It supports range-based horizontal paritioning. Its Improvements to MySQL Proxy include:
A). Improve performance without using LUA scripts. For example, if you want to set the results returned from multiple data sources and interact with the LUA script during the process, the performance overhead is relatively high.
B) client login verification. MySQL Proxy supports direct login verification between the client and each server. Spock Proxy manages the client in a unified manner and isolates the connection between the client and the server.
C). Dynamic connection pool. Benefits from the improvement of the client login authentication mechanism
D). Currently, MySQL Proxy cannot perform read/write splitting. Spock Proxy implements this and implements Asynchronous Parallel Execution.
Architecture:

Image from Spock Proxy Official Website: http://spockproxy.sourceforge.net/

Pyshards
The Python-based Sharding solution is an open-source project for individual researchers. Its goal is to achieve automatic re-balancing (re-sharding), which is quite challenging. Currently, only MySQL is supported. Http://code.google.com/p/pyshards/

Reference: MySQL performance tuning and Architecture Design

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.