Data partitioning------"Designing data-intensive Applications" Reading notes 9

Last Update:2018-02-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Into the sixth chapter, we're going to start talking about the core problem in the Distributed system: data Partitioning . Distributed systems typically use large data nodes to handle massive datasets that are not handled by a single machine, so a large data set can be distributed across multiple disks, and the query load can be distributed across multiple processors. In this chapter, we first discuss the different ways to divide large datasets, and observe how the data index interacts with the partition and then explores the strategy of rebalancing the data partition. Finally, take a look at how the routing technology indexes the query to the correct partition. The content looks a lot more, let's get started.

1. Partitions and replicas

Partitioning and replicas are easy to confuse concepts, and we're here to clear both.
Each copy of the data partition can be stored on multiple nodes. This means that even if each record belongs to exactly one partition, it can still be stored on several different nodes for fault tolerance.

As can be seen, partitions and replicas need to solve different problems, and can not be confused, both as the core technology in the distributed system, together to provide a good solution for the distributed system.

2. Partitioning policy

The purpose of the data partitioning is to distribute the data and query load evenly across the nodes. ( In fact, replicas have the same effect, depending on the replica synchronization mechanism ) and if the data partitioning is unfair, some partitions will appear with more data or queries than other partitions, which we call skew . Data skew causes the partitioning effect to become worse, resulting in unbalanced load formation hotspots. So the partitioning strategy is usually based on partition uniformity, and then we'll cover several common partitioning strategies:

Range Partitioning

A range partition is a contiguous range key that is assigned, as in the volumes encyclopedia. If you know the boundaries between ranges, it is easy to determine which partition contains the given key. If you also know which partition is assigned to which node, you can send the request directly to the appropriate node.

But most of the time the key range is not necessarily evenly distributed , in order to distribute the data evenly, the partition boundary needs to adapt to the characteristics of the data. For each partition, we can arrange the keys in order, such as sstable, which obviously increases the efficiency of the range scan.

The disadvantage of range partitioning is that some access patterns cause hotspots . If a range of keys is frequently accessed, it causes a partition to read and write a lot ahead of the other partitions being idle. ( this case takes into account subdivision granularity or cascading index, with a more uniform feature first partitioning )

Hash partition

Because range partitioning is prone to hot issues, many distributed data stores use a hash function to determine the partitioning of a key value. A good hash function distributes the skewed data evenly, even if the data range is close, but their hash values are evenly distributed values. As shown, the time-close key value is partitioned evenly across multiple partitions by a hash function, and the hash of each key falls in the range of a partition that will be stored in that partition:

Using hash partitioning, we lost a good feature of the key-range partitioning, where once adjacent keys are now scattered across all partitions, so their sort order is lost. We can solve this problem by cascading the index . The cascading index method supports an elegant data model of a one-to-many relationship, combining the advantages of different partitioning methods by two partitioning methods, which determine the first part of the partition by key hashing, but the other columns are concatenated as sstables data. Therefore, a query cannot search for a range of values within the first column of a composite key, but if it specifies a fixed value for the first column, it can perform a valid range sweep on the other columns of the key. For example, on a social media site, a user can publish many updates. If you update the primary key (User_id,update_timestamp), you can effectively retrieve all updates for a specific user within a certain time interval. Different users can be stored on different partitions, but in each user, updates are stored in timestamp order on a single partition.

TIP: Mitigating Hotspots

Partitioning through hash functions does help reduce hotspots. However, it cannot completely avoid them: in extreme cases, all read and write operations are the same key, and eventually all requests are still in the same partition. For example, on social media sites, a celebrity user with millions of followers can trigger a read-write storm when doing something. This event may result in a large number of writes to the same key in a short time (where key may be the user ID of a celebrity, or the behavior ID of people commenting). The hash function is also powerless, since the hashes of two identical IDs are still the same.

Most data systems do not automatically compensate for this highly skewed workload, so the application is responsible for reducing skew. For example, if a key is known to be very hot, a simple method is to add a random number at the beginning or end of the key. Only a two-digit decimal random number will divide the write into 100 different keys, allowing these keys to be assigned to different partitions. But after writing the different keys separately, any reads now have to do extra work because they must read and combine the data from all 100 keys. This also requires additional logging: because it is meaningful to add random numbers to only a small number of hotkeys, which is an unnecessary overhead for the vast majority of keys with low write throughput. Therefore, there are several ways to keep track of which keys are being split.

2. Partitioning and level two indexes

The partitioning scheme discussed above relies on a key value data model. By accessing a record through a primary key, the key can be used to determine the partition and use it to route the read and write requests to the partition responsible for the key.

And once a level two index is involved, the situation becomes more complex. Secondary indexes are often unsure of the uniqueness of a record and should be looking for a particular value to appear in such a way as: find all colors that are red cars such as a query. The problem with secondary indexes is that it cannot be mapped to partitions. There are two main ways to divide a database into two-level indexes: partition-based indexes and global-based indexes.

Partition-based indexing

If there is a website that sells used cars, each list has a unique ID, which is called a document. The database is partitioned through the document ID (for example, IDS 0 through 499 in partition 0, IDS 500 through 999 in partition 1).
You want to let users search for cars, allow them to filter by color and color, so you need to index the color two index, whenever a red car is added to the database, the database partition is automatically added to the index of the document's ID to the red index. As shown in the following:

In this index method, each partition is completely independent, and each partition retains its own index, overwriting only the document ID in the partition. It does not care about the data that is stored in other partitions. Whenever you need to write to a database to add, delete, or update a document, you only need to work with the partition that contains the document ID that you are writing.

However, when reading from an index, be aware that if you want to search for a red car, you need to send the query to all partitions and combine all the returned results . This results in a very time-consuming read query on a Level two index. Even if the parallel write and query partitions, the scatter/aggregate operation causes a delay amplification.

Global-based indexes

The previous section mentions the disadvantage of partitioned indexes, so we can create a global index that covers all the partition data. However, you cannot store an index on only one node because it may become a bottleneck and a point of failure. So the global index must also be partitioned, but it can be divided into different primary key indexes. As shown in the following:

Global indexes make read operations more efficient: Instead of dispersing/aggregating data from all partitions. However, the disadvantage of the global index is that the write speed is slower and more complex because writing a file can now affect multiple partitions of the exponent. ( each item in the file may be in a different partition, on a different node, in practice, a Level two global index is usually updated asynchronously ).

3 partition balancing

Over time, things have changed in the database:

(1) Query throughput increases, so you need to add more CPUs to handle the load.
(2) The size of the dataset increases, so you need to add more disk and RAM to store it.
(3) machine failure, other machines need to take over the fault machine responsibility.

All of these changes require that data and requests be moved from one node to another node. The process of moving a load from one node in a cluster to another is called partition balancing .

Regardless of the partitioning scheme, you typically want the partition to be balanced to meet the following requirements:

(1) After rebalancing, the nodes in the cluster should share the load ( data storage, read and write requests ) fairly.
(2) The database should continue to receive read and write operations when the partition is balanced.
(3) Move between nodes to minimize the movement of data in order to make the balance fast and complete, and to reduce network and disk I/O load.

Mass partitioning

Nodes create a number of partitions far beyond the number of nodes, and assign several partitions to each node. For example, a database running on a 10-node cluster can be split from the outset into 1000 partitions to allocate to approximately 100 partitions per node. When a node is added to the cluster, the new node can steal some partitions from each existing node until the partition is fairly allocated again. As shown in the following:

The number of partitions does not change, and the partition's key assignment does not change. The only thing that changes is the mapping between partitions and nodes. This change in partition balance is not instantaneous, and it takes time to transfer large amounts of data over the network, so the old partition nodes need to take on the read and write operations during the partition balancing. Mass Partitioning also allows more partitions to be allocated to more powerful nodes, forcing these nodes to assume a greater share of the load.

The number of partitions you configure at the beginning is the maximum number of nodes you can have, so you need to select a high enough number of partitions to accommodate future growth. However, each partition also has administrative overhead, so choosing a value that is too high can backfire.

Dynamic partitioning

For a database that uses a key-range partition, the number of fixed partitions for a fixed range value will be very inconvenient: if your boundary is wrong, you might put all the data in one partition, and all other partitions are empty. Manually repartitioning partitions will be cumbersome. So you can take a dynamic partitioning mechanism:

When a partition grows more than the configured size, it is divided into two partitions, and about half of the data is allocated in two new partitions. Conversely, if a large amount of data is deleted, a partition shrinks below a certain threshold, and it can be merged with adjacent partitions. The advantage of dynamic partitioning is that the number of partitions is compatible with the total amount of data. If there is only a small amount of data, a small number of partitions is sufficient, so the overhead is small; if there is a large amount of data, the size of each individual partition is limited to a configurable maximum value.

4. Request Routing

Partitioning a dataset on multiple nodes running on multiple machines poses a core problem: How does it know which node to connect to when the client wants to make a request? How the client perceives changes when the partition is re-balanced and the partition node changes.

At a high level, there are several different solutions to this problem:

1. Allow the client to contact any node. If the node has exactly the partition to which the request is applied, it can process the request directly, otherwise it forwards the request to the appropriate node, receives the reply, and passes the reply to the client.
1. All requests to the client are sent first to the routing layer, which determines which nodes should process each request and forward it accordingly.
1. Requires the client to know the partition and partition assigned to the node. In this case, the client can connect directly to the appropriate node without requiring any mediations.

The key question in three situations is: How do the components that make up the routing decision (perhaps one of the nodes, or the routing layer, or the client) know how the partitions are assigned to the nodes?

Many distributed data systems rely on a single coordination service such as zookeeper to track the metadata of this cluster, each node registering itself among the zookeeper. Zookeeper maintains the authority of partition node mappings, while the routing layer or client can subscribe to this zookeeper. When a partition changes, or a node is added or deleted, zookeeper notifies the routing layer so that it can keep its routing information updated. As shown in the following:

Cassandra and Riak take a different approach: propagate any change in the cluster state between nodes by using the gossip protocol. Requests can be sent to any node that forwards them to the appropriate node of the requested partition. This model proposes more complex database nodes, but avoids the dependence of external coordination services.

When you use the routing layer or send a request to a random node, the client still needs to find the IP address to which it is connected. These do not change as quickly as partitions are allocated to nodes, so DNS is often used for this purpose.

Summary:

In this article, we summarize the various strategies and technologies used in data partitioning technology, and hope that we can better understand the importance of data partitioning technology in distributed storage.

Data partitioning------"Designing data-intensive Applications" Reading notes 9

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Data partitioning------"Designing data-intensive Applications" Reading notes 9

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support