1. Data fragmentation and routing
The abstract model is a two-level mapping relationship, the first-level mappings are key-partition mappings, and the second-level mappings are partition-machine mappings.
Data shards have hash shards and range shards:
Hash shards only support point queries, such as Cassandra,voltmort,membase;
Range sharding Support range query, Google's bigtable and Ms Azure;
At the same time support two kinds of Yahoo's pnuts.
2. A common means of data fragmentation when hashing shards, the most common of which is the 3 hash table of the hash: polling, virtual bucket, consistent hashing method
2.1 Polling is also called hash modulus method
H (Key) =hash (key) MODK
Advantages: Simple Implementation
Cons: Lack of flexibility, such as the need to re-hash when adding or reducing a physical machine
Cause: The key-partition mapping and the partition-machine mapping are combined, both parts are completed by the same hash function, resulting in tight coupling between the machine and the mapping function.
2.2 Virtual Buckets
Key-partition mapping takes a hash function, and partition-machine is implemented using tabular management.
2.3 Consistent Hash
Distributed hash table DHT (Distributed hash table)
3. Scope sharding
The primary key of all records is sorted first, then the records are divided into data shards in the ordered primary key space, and each data shard stores all the records within the ordered primary key space fragment.
Data shards are often managed in physical machines using LSM trees.
Reference documents:
"1" http://blog.csdn.net/gdhuyufei/article/details/42101231
Big Data reading notes (1)