Big Data reading notes (1)

Source: Internet
Author: User

1. Data fragmentation and routing

The abstract model is a two-level mapping relationship, the first-level mappings are key-partition mappings, and the second-level mappings are partition-machine mappings.

Data shards have hash shards and range shards:

Hash shards only support point queries, such as Cassandra,voltmort,membase;

Range sharding Support range query, Google's bigtable and Ms Azure;

At the same time support two kinds of Yahoo's pnuts.

2. A common means of data fragmentation when hashing shards, the most common of which is the 3 hash table of the hash: polling, virtual bucket, consistent hashing method

2.1 Polling is also called hash modulus method

H (Key) =hash (key) MODK

Advantages: Simple Implementation

Cons: Lack of flexibility, such as the need to re-hash when adding or reducing a physical machine

Cause: The key-partition mapping and the partition-machine mapping are combined, both parts are completed by the same hash function, resulting in tight coupling between the machine and the mapping function.

2.2 Virtual Buckets

      

Key-partition mapping takes a hash function, and partition-machine is implemented using tabular management.

2.3 Consistent Hash

Distributed hash table DHT (Distributed hash table)

                  

3. Scope sharding

The primary key of all records is sorted first, then the records are divided into data shards in the ordered primary key space, and each data shard stores all the records within the ordered primary key space fragment.

Data shards are often managed in physical machines using LSM trees.

    

Reference documents:

"1" http://blog.csdn.net/gdhuyufei/article/details/42101231

Big Data reading notes (1)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.