NoSQL ecosystem--hash Shard and range sharding two kinds of shards

Source: Internet
Author: User
Tags cassandra couchdb

13.4 Scale-out provides performance gains

Many NoSQL systems are based on the key value model, so the query condition is basically based on the key value of the query, there is no time to query the entire data. Since basically all query operations are in the form of basic key values, shards are often also based on data keys: Some properties of the key determine which machine the key value pair is stored on. Below we will describe the hash shard and the range Shard two kinds of sharding methods.

3.4.2 Data fragmentation through the Coordinator

Since COUCHDB focused on single-machine performance and did not provide a similar scale-out scenario, there were two projects: Lounge and Bigcouch, which shards the data in COUCHDB by providing a proxy layer. in this architecture, proxy acts as the front-end machine for the COUCHDB cluster, accepting and allocating requests to multiple couchdb on the backend. There is no interaction between the couchdb on the back end. The coordinator assigns the request to a specific machine on the lower level by the key value of the operation. Twitter has implemented a coordinator called Gizzard, which enables data fragmentation and backup capabilities. Gizzard doesn't care about data types, it uses a tree structure to store data-range identifiers that you can use to encapsulate SQL or NoSQL systems.

13.4.3 consistency Hash loop algorithm

A good hash algorithm can keep the data more evenly distributed. This allows us to store the data on multiple machines by this distribution. Consistent hash is a widely used technology, which was first used in a system called distributed hash Tables (dhts). Those dynamo applications, such as Cassandra, Voldemort, and Riak, basically use a consistent hash algorithm.

Backing up data

Consistent hash of data backup is usually used in the following way: The data redundancy exists in the order of the nodes belonging to the node, for example, you have a redundancy factor of 3 (that is, the data will be stored in different nodes three), then if the hash calculation of your data in a interval [7,233], Your data will be stored on a,b,c three nodes at a time. So if a node fails, then the B,C node can handle this part of the data request. Some designs allow the e-node to extend its scope to A233 to accept requests for a failed node A.

Optimized data allocation strategy

In order to solve the problem of uneven data distribution due to fewer nodes, many DHT systems implement a technology called virtual node. For example, 4 virtual nodes in the system, a node may be virtualized into the a_1,a_2,a_3,a_4 of the four virtual nodes, and then the four virtual nodes then hash, a node is responsible for the key value range is more scattered.

13.4.4 Continuous Range Partitioning

Using a continuous range partitioning method for data fragmentation requires us to save a mapping relationship table indicating which key value corresponds to which machine is present . Similar to a consistent hash, a continuous range partition splits the key value in a contiguous range, and each piece of data is specified to be stored on a node and then redundantly backed up to another node. In contrast to the consistent hash, contiguous range partitioning allows the two adjacent data on the key value to be stored on the same data segment. in this way, the data routing table only needs to record the beginning and end point of a piece of data [start,end]. By dynamically adjusting the mapping of data segments to machine nodes, the machine load of each node can be more accurately balanced. If the data load of a segment is large, the load controller can reduce the number of data shards it is responsible for by shortening the data segment that the node is responsible for. By adding such a monitoring and routing module, we are able to better load balance the data nodes.

How to handle BigTable

A range partitioning approach is described in the Google BigTable paper , which cuts data into pieces of tablet data. Each tablet holds a certain number of key-value pairs . Then each tablet server stores multiple tablet blocks, and the number of tablet chunks saved per tablet server is determined by the server pressure. Each tablet is probably 100-200mb large. If the size of your tablet is smaller, then two tablets may be merged into a single tablet, and if a tablet is too large, it will be split into two tablets to keep the size of each tablet within a certain range. There is a master machine throughout the system that dynamically adjusts the distribution of the tablet on each machine based on factors such as the size of the tablet, the load, and the load capacity of the machine.

The master server has the metadata table for the tablet's attribution . When the amount of data is very large, the metadata table will actually become very large, so the attribution table is actually cut into a separate tablet on the tablet server . When querying the data, you need to query two times.

Fault Handling

In BigTable, the master machine is a single point of failure, but the system can tolerate a short-time master failure. On the other hand, if the tablet server fails, master can assign all requests to its tablet to other machine nodes. to monitor and handle node failures, BigTable implements a module called chubby, Chubby is a distributed locking system for managing cluster members and detecting the viability of each member. Zookeeper is an open source implementation of Chubby , and there are many Hadoop-based projects that use it to schedule level two master and tablet nodes.

A NoSQL project based on range partitioning

HBase uses the layered theory of bigtable to implement the range partitioning strategy. The tablet-related data exists in HDFs. HDFS handles redundant backups of the data and is responsible for ensuring the consistency of each backup. Rather like processing data requests, modifying the storage structure, or performing a split and merge of the tablet, it is the specific tablet server responsible. MongoDB also uses a bigtable-like scheme to implement range partitioning . He uses several configuration machines to form clusters to manage the distribution of data on nodes. These machines hold the same configuration information, and they use the two-phase commit protocol to ensure data consistency. These configuration nodes actually act as both the routing role of Master in BigTable and the role of the chubby high availability scheduler . and MongoDB specific data storage node through its replica sets scheme to achieve data redundancy backup. Cassandra provides an ordered partitioned table that allows you to quickly scope queries on your data . Cassandra also uses a consistent hash algorithm for data distribution, but the difference is that it is not directly on a single data hash, but a range of data is hashed, that is, 20th data and 21st data are basically allocated to the same machine node . The gizzard framework of Twitter also uses range partitioning to manage the backup and distribution of data across multiple nodes.

13.4.5 which partitioning strategy to choose

If you need to do a range query frequently, you need to manipulate the key value sequentially, so you can choose a range partition to be better . What if I don't do a range query or a sequential query? At this time the hash partition may be more convenient, and hash partition may be through the virtual node settings to solve the problem of hash inequality. In the hash partition, it is almost possible to know which node the corresponding data exists on only if the client executes the corresponding hash function. If you take into account the data transfer after the node failure, it may be troublesome to get the data storage node. scope partitioning requires that a query be made to the configuration node before querying the data , and if there is no particularly good high-availability disaster-tolerant scenario, the configuration node will be a dangerous single point of failure . Of course, you can load-balance the configuration node again to reduce the load. When a node fails in a range partition, the data above it can be allocated to multiple nodes, rather than to the next node in its order, as in the case of a consistent hash, causing the load to soar on the following node.

Not to be continued!

NoSQL ecosystem--hash Shard and range sharding two kinds of shards

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.