Methods and software related to MySQL Architecture Design

Source: Internet
Author: User
Tags database sharding

Preface

Recently, I have been learningMySQLThe database architecture-related content searches the Internet for a large number of related materials and articles, and reads them in a rough manner. It is found that architecture-related things are unfathomable and require a wealth of knowledge and practical experience.

My experience and experience are obviously not enough, so I sorted out the relevant content below, and counted it as a rough summary of this study! Most of the content in this article comes from the Internet. Due to my limited level, the content is not accurate. There may be many errors. I hope you will not be able to correct them! I hope this article will help us understand the database architecture.

1. Data splitting Solution

When the database is large, read/write operations, especially write operations, are too frequent, and it is difficult for a server to support them, we need to consider splitting the database. The so-called database splitting means that we distribute data from one database to multiple database servers according to certain conditions. Because multiple servers are used, when one server goes down, only part of the data in the system is unavailable, not all. Therefore, database splitting can not only use multiple servers to share the load pressure on the database, but also improve the overall availability of the system.

There are two ways to split data::Vertical Split and horizontal split.

1.1 Vertical Split

Vertical splitting refers to splitting the data tables accessed by each module into different databases according to the system function module.

Applicability: vertical splitting is suitable for systems with good architecture design, unified interaction between modules, and low coupling.

Advantages: the database splitting is simple and clear, and the rules are clear. The system module is clear and easy to integrate. Data maintenance is convenient and positioning is easy.

Disadvantages: Table association cannot be implemented in the database, but can only be implemented in the program. performance problems still exist for data tables with large access volumes and large data volumes; transaction processing becomes more complex, cross-server distributed transactions increase. Excessive splitting leads to over-complicated systems, scalability, and maintenance difficulties.

1.2 horizontal split

Horizontal segmentation refers to splitting data in a data table with a large data volume into multiple databases based on certain rules of a field according to the logical relationship of the data.

Applicability: horizontal splitting is applicable to databases with large data volumes and suitable fields and rules for horizontal splitting. The interaction between multiple databases after the database is horizontally split should not exist.

Advantages: Table association can be realized in the database; data tables with no large data volume and ultra-high load exist; transactions can be processed in the database, which is relatively simple; under reasonable segmentation rules, good scalability.

Disadvantages: splitting rules are generally complicated, making it difficult to find a sharding rule suitable for the entire database. The difficulty of Data Maintenance increases, and the difficulty of manual data locating increases. The Coupling Degree of system modules is high, the difficulty of data migration splitting increases.

In actual data splitting, we should first perform vertical splitting based on the system module design. After the modules are subdivided to a certain extent, if the modules are further subdivided, the system architecture will be too complex and the entire system will face the risk of losing control. At this time, we need to take advantage of horizontal splitting to avoid the system complexity caused by vertical splitting and the problems of getting out of control. At the same time, because the data has been properly vertically split, the horizontal split rule is relatively simple, and the problem of high coupling between system modules has also been solved. In short, data splitting should follow the principle of "reasonable vertical segmentation first, and timely horizontal segmentation; modular segmentation first, and then dataset segmentation ".

2. Data Integration Plan

After data is vertically and horizontally split and stored on different database servers, the biggest problem facing the system is how to integrate the data from different database servers. There are two ways to solve this problem. The first is to configure and manage the information of one or more databases and their servers required by this module in each module of the system, and integrate the data in the module; type 2: Use the intermediate proxy layer to manage all data sources in a unified manner. The database cluster is transparent to system applications. The first scheme requires a relatively low cost during initial development, but in the long term, the scalability of the system will be greatly limited. The second solution is just the opposite. The short-term cost is relatively large, but it is conducive to system expansion. The second solution can be implemented through some third-party software.

2.1 MySQLProxy

MySQLProxyIt can be used to monitor, analyze, and transmit communications between applications and databases. It can achieve connection routing,QueryAnalysis,QueryFilter and modify, load balancing, and basicHAMechanism.

Principle:MySQLProxyIn fact, a connection pool is established between the application request and the database service. Send all application requestsMySQLProxyAnd thenMySQLProxyPerform corresponding analysis to determine whether the operation is a read operation or a write operation and distribute it to the correspondingMySQLServer. For multiple nodesSlaveClusters can also achieve load balancing.

Advantages:MySQLProxyWith great flexibility, we can use it to the maximum extent.

Disadvantages:MySQLProxyIn fact, it does not directly provide related functions. These functions must be compiled on your own.LUAScript implementation.

2.2 Amoeba

AmoebaIsJavaDevelopedProxyThe open-source framework of the program is designed to solve the problem of data integration in distributed databases. It hasQueryRouting,QueryFiltering, read/write splitting, load balancing, andHAMechanism.AmoebaIt can integrate complex data sources after data splitting, reduce the impact of data splitting on the entire system, reduce the number of connections between the database and the client, and implement data read/write splitting.

Principle:AmoebaEquivalentSQLThe requested router, which centrally responds to application requests.SQLThe request is sent to a specific database server for execution. Based on this, load balancing, read/write splitting, and high availability are achieved.

Advantage: Based onXMLConfiguration file,SQLJEPSyntax writing rules, simple configuration

Disadvantages: currently, transactions are not supported. It is not suitable for returning a large number of queries. database/table sharding is not supported and database sharding instances are only supported.

2.3 HiveDB

HiveDBIt is also based onJavaDevelopment,MySQLThe database provides an open-source framework for data splitting and integration. However, the currentHiveDBOnly horizontal data splitting is supported.HiveDBIt mainly solves the scalability of databases with large data volumes and high-performance access to data. It also supports data redundancy and basicHAMechanism.

Principle:HiveDBThrough user-definedPartition keysDistribute data to multiple database servers for parsing during accessQueryRequest, automatically analyze filtering conditions, read data from multiple databases in parallel, and then merge the result set and return it to the client application.HiveDBImplementation Mechanism andAmoebaAndMySQLProxyDifferent, it can achieve data redundancy by itself without using other replication and synchronization technologies. The underlying layer is mainly based onHibernate Shards.Hibernate ShardsYesGoogleThe technical team isGoogleFinancial System DataShardingIt was born in the process.Hibernate ShardsIt is implemented at the framework layer and has its unique features: StandardHibernateProgramming Model, will useHibernateIt can be done, with low technical cost and relatively elasticShardingPolicy and Support for virtualShard.

Advantage: with the support of commercial companies, you can achieve data redundancy on your own.

Disadvantage: only horizontal partitions are supported.

There are still some problems in the process of data integration, such as distributed transactions and cross-nodeJOINCross-node sorting and paging. For distributed transactions, we need to split them into small transactions in multiple single databases, which are controlled by the application. Cross-nodeJOINTo solve this problem, we need to first extract data from a node, and then the application will go to another nodeJOINOr useFederatedEngine. When sorting pages across nodes, we can read data from multiple nodes in parallel and then sort the pages by application.

3. data redundancy plan

Any device or service, as long as it is a single point, has a great security risk. Once this device or service goes down, it is difficult to have a backup device or service to replace it in a short time. As the core of the system, the database must have a backup to quickly replace the original service in case of exceptions to achieve high availability. In addition, replication technology must be used to synchronize data between multiple database nodes to achieve database read/write splitting. There are many ways to implement data synchronization. The following describes some common methods.

3.1 MySQL Replication

MySQL ReplicationYesMySQLIt comes with an asynchronous replication function. During the replication process, one server acts as the master server, and one or more other servers act as slave servers, that is, the master-slave mode.

Principle:MySQLUse3Threads to execute the replication function. When replication starts, the slave server createsI/OThe thread connects to the master server and requires the master server to send statements recorded in its binary log. The master server creates a thread to send the binary log content to the slave server. Slave ServerI/OThe thread reads the content sent by the master server thread and copies the data to a local file in the data directory of the slave server. This file is called a relay log. The third thread isSQLA thread is created by a slave server to read relay logs and perform updates contained in the logs. Common architecture Methods: Master-Slave, Master-Master and master-Slave cascade, Master-Master-Cascade.

Advantages: simple deployment, easy implementation, and yesMySQLThis feature allows you to easily switch between the master and slave nodes. You can use a third-party software or write a simple script to automatically switch between the master and slave nodes.

Disadvantage: in actual use, only a single host can write data, which may not meet the performance requirements. When the server host hardware fails, some data that has not been transmitted to the slave server may be lost.

3.2 MySQLCluster

MySQL ClusterYesMySQLThe high-availability and high-redundancy version is applicable to distributed computing environments.NDB ClusterStorage Engine("NDB"Is a" in-memory "storage engine with high availability and good data consistency.), Allowed in1ItemsClusterMultipleMySQLServer.

Principle: MySQL Cluster integrates the standard MySQL server with the NDB Cluster Storage engine. MySQL Cluster is composed of a group of computers, each of which runs a variety of processes, including MySQL servers, NDB Cluster data nodes, management servers, and (possibly) dedicated Data Access Program. All these programs constitute the MySQL Cluster together. When the data is saved to the NDB Cluster Storage engine, the table (structure) is saved to the data node, and the application can directly access these tables from all other MySQL servers. See:

Advantages: high availability and excellent performance. Each piece of data must be copied to at least one copy on different hosts and synchronized in real time. Without a shared architecture, the system can use inexpensive hardware without special requirements on hardware and software.

Disadvantage: The maintenance is complicated and is not suitable for use in many scenarios. 3.3 DRBDDisk network image Solution

DRBD(Distributed Replicated Block Device), which is composedLINBITAn open-source solution developed by the company to synchronize data images of Block devices through the networkClusterSoftware, also known as NetworkRAID1.

Principle:DRBDBetween the file system and the disk media, capture allIOOperation to callIOModule to read and write the underlying disk media. WhenDRBDAfter the write operation is captured to the file system, the local disk write operation is performed at the same timeTCP/IPProtocol, through the network device of the local host (NIC) SetIOThe network device that is passed to the remote host. When the remote hostDRBDListen to the passedIOThe data is immediately written toDRBDThe disk device that is maintained.DRBDThere are three replication modes for Processing Remote Data writes, which are applicable to different reliability and performance requirements scenarios.

Advantage: powerful functions. data can be synchronized across physical host images at the underlying block device level based on performance and reliability requirements;IOOperations are in order to meet the demanding data consistency requirements.

Disadvantages: the non-distributed file system environment cannot support simultaneous visibility of image data. The performance and reliability are in conflict and cannot be applied to environments with demanding performance and reliability requirements, resulting in high maintenance costs.

3.4 RaiDB

RaiDBWhich is calledRedundantArrays of Inexpensive Databases, That is, throughRaidConcept to manage database data: by combining multiple cheap database instances into a database array, it provides better performance and fault tolerance than a single database,It also hides the complexity of distributed databases and provides applications with an independent database.

Principle: InRaiDBThe Controller is in front of the database array. The application sends a requestRaiDBThe Controller distributes requests to a group of databases. And DiskRaidSame,RaiDBThere are also different levels or data distribution solutions, suchRaiDB-0,RaiDB-1,RaiDB-1-0,RaiDB-0-1To provide different cost, performance, and fault tolerance trade-offs.

Advantages:RaidSame,RaiDBIt can also greatly improve the data read/write speed and provide Fault Tolerance functions.

Disadvantage: Only tables in the database can be divided into different database instances, and data tables cannot be separated.JoinThe scalability improvement depends on the number of tables and the load of each table.

  • 1
  • 2
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.