MySQL Cluster and MongoDB Cluster

Source: Internet
Author: User

Distributed Database computing involves distributed transactions, data distribution, and data convergence computing.

Distributed Databases can achieve high security, high performance, high availability, and other features. Of course, they also bring high costs (fixed costs and operational costs ), we analyze the design ideas from the implementation of MongoDB and MySQL Cluster to abstract some design methods that we can reference when designing databases and apply them to our production system.

First, let's talk about the characteristics of relational and non-relational databases.

MySQL Innodb and Cluster have complete ACID attributes

A atomicity the entire transaction will be taken as A whole, either completed or rolled back

C. The integrity constraints of the database are not damaged before and after the start of a consistent transaction.

I isolation the execution of two transactions does not affect each other, and the time of the two transactions does not affect each other.

D. After the transaction is complete, the changes made to the database by the transaction will be permanently stored in the database and will be completely

To implement ACID, we introduce Undo, Redo, MVCC, TAS, signal, two-phase blocking, two-phase commit, and blocking, and introduce data access paths, the whole thing will become extremely complex.

When MySQL complies with the SQL standard and uses the SQL standard, it can achieve seamless migration between RDBMS.

Its rich data types, complete business logic control and presentation capabilities have always been the first choice for commercial applications.

MongoDB uses a set to represent data and does not have the ACID attribute. It is widely recognized for its non-type, rapid deployment, and rapid development.

Both RDBMS and MongoDB use the index structure. MongoDB supports the B-tree index, which is created based on user needs and can be nested between containers at various layers.

There are two data storage methods in the database:

1. Heap: data is always stacked at the end of the file according to the backward insertion method. when accessing data using the index structure, the Data Pointer will be obtained in the index and then the data will be obtained, when data is deleted, it is deleted from the corresponding location. For frequently updated heap tables, it needs to be optimized regularly to use heap tables, the data sequence access principle will be broken (access optimization is performed in DBMS, and performance is partially improved). Because there is no fill factor, under the same compression algorithm, space can be greatly reduced, and heap tables are suitable for sequential range access, such as data warehouses and other business scenarios.

2. index organization: Generally, the index organization table uses B + as the constructor. The entire structure is like a inverted tree (from the perspective of data access streams). the routing information is stored on the branches, all data is stored on the leaf node, and all the leaves are connected in sequence through two-way pointers. Due to the limited space-time access, this can greatly improve data performance, DBMS accesses and constructs data based on access and access paths, and the access path depth directly affects performance. Generally, it is recommended that the access path be controlled within 4 (less than or equal to 3 ), the reason is that accessing multi-layer paths requires a higher cost and the cost of maintaining the index tree is becoming more and more expensive.

Our common Innodb and MySQL Cluster are index organization tables, MyISAM is the heap table, and MongoDB's organizational structure is the heap

Databases with the AICD attribute have the index maintenance function. Because the MyISAM storage engine and MongoDB have a heap organizational structure without ACID Control, metadata and indexes may be inconsistent, data access becomes invalid, causing data inconsistency. However, because ACID is not required, the update speed will be greatly improved (the updates described in this article include all write operations, the MyISAM storage engine requires regular consistency check. because it does not have the ACID attribute, the MyISAM storage engine needs to lock the table for data updates, resulting in low performance of updates in high concurrency.

MySQL Cluster architecture

The Cluster can be divided into SQL nodes, data nodes, and management nodes (MySQL Cluster provides APIs for internal calls, and external applications can use APIs to access any layer of methods)

The SQL node provides functions such as SQL command request, resolution, connection management, query optimization and response, cache management, data merge, sort, and cropping. When the SQL node is started, the architecture information will be synchronized to the management node for data query routing

Data nodes provide data access, persistence, API data access, and other functions

The management node maintains node activity information and implements data backup and recovery. The management node obtains the status and error information of nodes in the entire cluster environment, and reports the information of each node in each cluster to all other nodes in the cluster, this is critical to the data routing rules of SQL nodes. When the node is expanded, the data will be rebuilt.

Data nodes store data in shards and multiple data copies in at least two copies. The data is stored in the memory and persisted according to the rules of the Management node. As a data access location, a large amount of memory is required.

As a query entry, SQL nodes consume a large amount of cpu and memory resources. Distributed Management Nodes can be used, and a layer of request Distribution and HA control mechanism can be encapsulated outside the SQL node to solve single point of failure and performance problems, it provides linear scalability

The management node maintains global rule information. When a node fails, a fault notice will be issued.

In the entire Cluster system, any component supports dynamic expansion and linear expansion, providing high-availability and high-performance solutions.

Problem:

When a new data node is added, the access path information needs to be reconstructed, which puts pressure on the Management node. We recommend that you perform this operation during off-peak hours.

The Cluster uses the automatic key-value recognition data sharding solution. You do not need to care about the data slicing solution (partition key rules are provided in 5.1 or later), and distributed databases are transparent, data sharding rules are completed based on the rowid of 1, primary key, 2 unique indexes, and 3 Automatic lines, and then the number of clusters is distributed. The data accessed is like RAID access mechanism, data can be extracted from each node in parallel and hashed. When non-primary keys or partition keys are used for access, all Cluster nodes will be scanned and performance will be affected (this is the core challenge facing the Cluster)

MySQL Cluster architecture

  • 1
  • 2
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.