Sharding design and principles of MySQLCluster and MongoDB replication Clusters

Source: Internet
Author: User
Distributed Database computing involves distributed transactions, data distribution, and data convergence computing. distributed databases require high security, performance, and availability, of course, it also brings about a high cost (fixed cost and operational cost). We analyze the design ideas from the implementation of MongoDB and MySQLCluster to abstract the design ideas we have designed in the database.

Distributed Database computing involves distributed transactions, data distribution, and data convergence computing. distributed databases require high security, performance, and availability, of course, it also brings about a high cost (fixed cost and operational cost). We analyze the design ideas from the implementation of MongoDB and MySQL Cluster to abstract the design ideas we are designing databases.

Distributed Database computing involves distributed transactions, data distribution, and data convergence computing.

Distributed Databases can achieve high security, high performance, high availability, and other features. Of course, they also bring high costs (fixed costs and operational costs ), we analyze the design ideas from the implementation of MongoDB and MySQL Cluster to abstract some design methods that we can reference when designing databases and apply them to our production system.

First, let's talk about the characteristics of relational and non-relational databases.

MySQL Innodb and Cluster have complete ACID attributes

A atomicity the entire transaction will be taken as A whole, either completed or rolled back before the C consistency transaction starts and after the transaction ends, the integrity limit of the database is not damaged. I isolation the execution of two transactions does not interfere with each other, and the two transactions do not affect each other. D persistence after the transaction is completed, the changes made by the transaction to the database are permanently stored in the database and are completely

To implement ACID, suchUndo, Redo, MVCC, TAS, signal, two-phase blocking, two-phase commit, blocking, and so on, and introduce the data access path, The whole thing will become extremely complex

When MySQL complies with the SQL standard and uses the SQL standard, it can achieve seamless migration between RDBMS.

Its rich data types, complete business logic control and presentation capabilities have always been the first choice for commercial applications.

MongoDB uses a set to represent data and does not have the ACID attribute. It has no type, rapid deployment, and rapid development. It is widely recognized that either RDBMS or MongoDB uses an index structure, mongoDB supports B-tree indexes. indexes are created based on user needs and can be nested between containers at various levels.

There are two data storage methods in the database:

1. Heap: data is always stacked at the end of the file according to the backward insertion method. when accessing data using the index structure, the Data Pointer will be obtained in the index and then the data will be obtained, when data is deleted, it is deleted from the corresponding location. For frequently updated heap tables, it needs to be optimized regularly to use heap tables, the data sequence access principle will be broken (access optimization is performed in DBMS, and performance is partially improved). Because there is no fill factor, under the same compression algorithm, space can be greatly reduced, and heap tables are suitable for sequential range access, such as data warehouses and other business scenarios.

2. index organization: Generally, the index organization table uses B + as the constructor. The entire structure is like a inverted tree (from the perspective of data access streams). the routing information is stored on the branches, all data is stored on the leaf node, and all the leaves are connected in sequence through two-way pointers. Due to the limited space-time access, this can greatly improve data performance, DBMS accesses and constructs data based on access and access paths, and the access path depth directly affects performance. Generally, it is recommended that the access path be controlled within 4 (less than or equal to 3 ), the reason is that accessing multi-layer paths requires a higher cost and the cost of maintaining the index tree is becoming more and more expensive.

Our common Innodb and MySQL Cluster are index organization tables, MyISAM is the heap table, and MongoDB's organizational structure is the heap

Databases with the AICD attribute have the index maintenance function. Because the MyISAM storage engine and MongoDB have a heap organizational structure without ACID Control, metadata and indexes may be inconsistent, data access becomes invalid, causing data inconsistency. However, because ACID is not required, the update speed will be greatly improved (the updates described in this article include all write operations, the MyISAM storage engine requires regular consistency check. because it does not have the ACID attribute, the MyISAM storage engine needs to lock the table for data updates, resulting in low performance of updates in high concurrency.

MySQL Cluster architecture

The Cluster can be divided into SQL nodes, data nodes, and management nodes (MySQL Cluster provides APIs for internal calls, and external applications can use APIs to access any layer of methods)

The SQL node provides functions such as SQL command request, resolution, connection management, query optimization and response, cache management, data merge, sort, and cropping. When the SQL node is started, the architecture information will be synchronized to the management node for data query routing

Data nodes provide data access, persistence, API data access, and other functions

The management node maintains node activity information and implements data backup and recovery. The management node obtains the status and error information of nodes in the entire cluster environment, and reports the information of each node in each cluster to all other nodes in the cluster, this is critical to the data routing rules of SQL nodes. When the node is expanded, the data will be rebuilt.

Data nodes store data in shards and multiple data copies in at least two copies. The data is stored in the memory and persisted according to the rules of the Management node as the data access location, A large amount of memory is required to support SQL nodes as the query entry, which consumes a large amount of cpu and memory resources. You can use Distributed Management Nodes, it also encapsulates a layer of request Distribution and HA control mechanism outside the SQL node to solve single point and performance problems. It provides a linear expansion function. The management node maintains global rule information. When a node fails, fault announcement will occur throughout the Cluster system. Any component can support dynamic expansion and linear expansion, providing high-availability and high-performance solutions.

Problem:

When a new data node is added, the access path information needs to be reconstructed, which puts pressure on the Management node for Data Reconstruction. In this operation, we recommend that you use the automatic key-value recognition data sharding scheme for the Cluster during off-peak hours, users do not need to care about the data slicing solution (partition key rules are provided in 5.1 and later versions), and the distributed database is transparent, the data sharding rules are completed based on the rowid of the primary key, the unique index of the secondary key, and the automatic row identification of the three. Then, the number of clusters is distributed. The data accessed is like the RAID access mechanism, data can be extracted from each node in parallel and hashed. When non-primary keys or partition keys are used for access, all Cluster nodes will be scanned and performance will be affected (this is the core challenge facing the Cluster)

The MongoDB replica set architecture constructs a distributed database solution based on MongoDB replication:

MongoDB provides a similar architecture as MySQL Cluster. In mongod, mongos, and mongo, it includes:

Mongod: a data access excuse to distribute requests to Mongos node Mongs: Data Access routing, query optimization, data merge, sort, cropping, and other functions mongo: data Access (using the mongo Protocol also provides direct data access)

When building a collection, MongoDB needs to provide a data sharding rule, which will be recorded in mongos, and the query request mongod will initiate a request to mongos, mongos accesses data in mongo Based on the access path. Because MongoDB provides users with a selectivity, how to slice the data and quickly access the data when the access is transparent to users

MongoDB problems:

When data is accessed using non-sharding rules (the index can be built on each shard), all Mongo cluster nodes are fully scanned (multiple redundant copies can be used to implement different sharding rules, this is also a common method for data sharding applications. When a data cluster is added, all data nodes will be restructured, directly affecting the performance.

Summary:

MongoDB uses the heap access path method to organize data and does not contain ACID features to update and query a large amount of data (for an architecture with MVCC, it will reduce the response speed of highly concurrent and large datasets) but no ACID ensures the stability and security of key data.

MongoDB solves the automatic sharding rules of the MySQL Cluster (user-defined functions are provided after 5.1) and transfers the data processing of the SQL node of the MySQL Cluster to mongos, mySQL Cluster uses the SQL-> node-> SQL access path. MongoDB uses Mongod-> Mongos-> node-> Mongos-> Mongod access path. In terms of architecture, mySQL Cluster is simpler and more efficient

MySQL Cluster has complete commercial support and general standard support. It has relatively rich management tools and MongoDB has relative performance advantages. However, it lacks powerful stability and security support and rich management tools, both have their own advantages, but have similar fatal weaknesses.

MySQL Cluster can implement a replication-based topology architecture, and synchronize data to a remote location without changing the internal topology architecture to form a star topology, mongoDB still lacks related technical solutions in this regard (of course, it can be a replication solution, but MySQL Cluster is implemented at a higher level and MongoDB is implemented at a lower level. For management, will face great challenges)

In commercial terms, MySQL Cluster has enough commercial value, but its defects are also obvious. MongoDB's improvements to MySQL Cluster are worth thinking about and its daily data architecture design, it is introduced in the pattern design, but as a large-scale commercial application, MySQL Cluster and MongoDB still have a long way to go, whether it is inherent defect or management mode.

From Region

Original article address: Design and principles of sharding for MySQL Cluster and MongoDB replication clusters. Thank you for sharing your gratitude.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.