NOSQL (iii) Distributed data Model

Source: Internet
Author: User

"NoSQL essence" Reading notes, reproduced please indicate the source "Jiq Technical Blog"

The main reason for the emergence of NoSQL is the need for a database that can run on a large cluster

Aggregation-oriented databases are well suited for scale-out cluster architectures, where aggregation naturally becomes a data distribution unit with two main paths to data distribution: "Replication (Replication)" and "Shard (sharding)", where replication copies the same data to multiple nodes. Sharding is the spread of data across different nodes.

1 Single Server

If you are using a NoSQL database primarily to handle aggregations, consider deploying a NoSQL database on a single server. In other words, the "single server" scenario is preferred in scenarios where the distribution of data is not needed.

2 Shards

If the database is busy: Different users need access to different parts of the data set. This allows for horizontal expansion by prioritizing shards and storing parts of the data on different servers.

idea: in order to distribute different users ' requests evenly to different servers, it is key to adopt the strategy of how to store the data. Aggregation is designed to put together data that is often accessed concurrently, so aggregations can be used as a distributed unit of data. How to distribute aggregated data evenly across different machines may sometimes require "domain-specific rules," and many NoSQL databases already provide "auto-sharding (auto-sharding)", which is responsible for distributing data to each shard by the database. and directs the data access request to the appropriate shard.

Pros: sharding is especially useful for boosting performance because it can improve both read and write efficiency , and replication technology, especially with cache replication, can greatly improve read performance, but it does little for scenarios that require frequent write operations. Sharding provides a way to expand the writing capability horizontally.

Disadvantage: sharding is not very helpful for raising the database "resiliency", as with "single Server", and may even reduce the database's error resilience.

3 Copy

3.1 master-slave replication

idea: There is a master node and multiple slave nodes in the master-slave structure, which replicates the data from the node and ensures that all the data from the node is synchronized with the master node. When reading, the data can be read from the primary node or from any node, even if the primary node is faulted, the slave node can still handle the data read request, and can reassign a slave node as the new master node, which can be easily expanded horizontally by adding the node. This will not only "significantly improve data read performance", but also "ensure the recovery of read operations." Write is worse, you need to request the master node for the update operation, and then by the master node to publish the data update request to the Slave node, on the one hand the performance is not high, not suitable for frequent write scenes, on the other hand, if the primary node error, the data update request can not be processed before recovery.

Advantages: handle write requests with high performance and fail-back capability. Even if you do not need to scale out, master-slave replication is also useful, and the primary node handles all read and write operations, and the node can act as an "instant backup."

disadvantage: on the one hand because the main node is the bottleneck and weakness of the system, resulting in write operation performance and failure recovery ability are not satisfactory. On the other hand there is a big flaw, that is, the inconsistency of data, because if the master node processes an update operation that has not been fully notified to all slave nodes, different clients may read different values from each other from the node.

3.2 Peer Copy

idea: in order to solve the main node in master-slave replication as the bottleneck and weakness of the system, discarding the concept of the master node, all nodes can accept the write request.

Advantages: This problem solves the bottleneck and weakness of primary node as write operation in master-slave replication structure.

Cons: still consistent, because two different nodes can handle write requests at the same time, a "write conflict" occurs when the same data is attempted to be updated simultaneously, and the read consistency issue also exists. Two extreme resolution of consistency: one is to reconcile each copy before it is actually written, to ensure that no conflicts occur, and to merge conflicting writes so that any copy can write data.

4 combination of sharding and replication

idea: The data is fragmented first, and then for each piece of data are "master-slave Replication" for maintenance, which means that there are multiple primary nodes in the system, for each data, the main node responsible for it only one. The "Column family database" is one such example.

NOSQL (iii) Distributed data Model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.