Chunk and Route Design in kV System

Source: Internet
Author: User
Tags failover

Introduction:

In many kV systems or distributed cache systems, routing policies are designed based on physical nodes. For example, consistent hash of Voldemort. each node in the ring corresponds to a physical machine in reality. For a simple and relatively independent kV system, it may be okay to deal with a large number of Kv application scenarios. However, in our practical application scenarios, it is often possible that multiple business scenarios want to access a unified kV cluster. In this massive kV system, how can we effectively manage various application scenarios and make the best possible balance and efficient use of each physical machine in the massive kV? It is a problem that must be solved during the design of such a kV system. A simple management method similar to Voldemort may not be enough. It only solves the distribution of massive data, but does not solve the problem of reasonable resource sharing and resource management between machine clusters. This article discusses how to use an effective method to solve the above problems.

Simple Design:

Most of the familiar scenarios may be based on the partition design shown in figure (Consistent hash + physical node ):
 

Each node corresponds to an actual physical machine. When an application accesses data, the corresponding key is routed to the corresponding physical machine through a corresponding hash value, and the consistent hash policy is adopted, when we increase or balance the storage capacity or access volume of a node, the number of nodes affected will be relatively small.
However, the direct correspondence between hash nodes and physical nodes may cause the following problems:

  1. In the kV system, at least one copy of data is backed up for each physical node for availability consideration. In general, the backup data should be distributed on different physical machines. From this we may find that, in theory, we should allocate at least two physical machines to any kV cluster. If there are more backup nodes, the minimum number of machines will be larger. But the real problem is that when the data volume in a specific business scenario is small (the actual amount of resources used is far smaller than the machine's capacity ), it is difficult for us to reuse this physical machine in other business scenarios. Maybe we can manually start multiple process instances on this physical machine, as most memcached servers currently do, however, the independent use of each application brings great trouble to the subsequent system O & M.
  2. In this design, because the front-end client is directly associated with the back-end physical machine, any changes to the back-end server will directly affect the partitions and routes of the front-end client. In addition, as the number of applications increases, server expansion and restart will also add too much burden on the client. Although the client can be transparent through configuration push, the impact of changes may be huge. In extreme cases, because the hash policy is not well designed, it is possible that a server change will affect all business scenarios.

How can we solve these problems?

Here we need to introduce the concepts of namespace and Chunk. In fact, both of them are an abstract concept for the application.

Namespace:
If you want to solve multiple application scenarios and use the processing and storage capabilities of the same massive kV system at the same time, we must be able to logically isolate different application scenarios for ease of management and maintenance, therefore, we need to introduce the concept of namespace. a namespace actually corresponds to a specific kV application. For example, we want to store product details in a massive kV system, then, we create a product detail namespace in the management system. In this way, we logically distinguish product details from other application scenarios in this massive kV system. At the same time, you can give namespace certain naming rules, such as COM. apache. commons. product to distinguish different levels of namespace, so as to facilitate the query and Management of namespace, and even the namespace itself can be logically grouped.

Chunk
What is chunk? There is no doubt that a chunk exists relative to a physical node. As mentioned above, once the client calculates the hash result based on a key, a corresponding physical node can be found in the consistent hash routing algorithm, and the chunk replaces the physical node, in other words, a node found in a route is no longer the original physical machine, but a logical chunk that uses the chunk and the route table of the physical machine, we can find the physical machine where the virtual machine is located.

How can chunk be implemented on the corresponding physical machine?

  1. Configure namespace
    Logically, the client no longer focuses on specific physical machines, but only on their respective chunks. Therefore, when the system configures the namespace, we only need to configure the number of chunks required under the namespace, the maximum storage capacity of each chunk, and the maximum number of requests that can be processed. In addition, based on namespace, we can specify the following parameters in this business scenario: partition policy, hash algorithm, failover policy, availability level, and Chunk allocation (including capacity and traffic planning, in this way, even for the same kV clusterApplications of different levelsWe canAllocate different levels of availability, consistency, and partition fault tolerance features.
  2. Relationship between Chunk and physical server
    The corresponding implementation of chunk on physical machines varies depending on the storage type, but its logical concept should be consistent, A physical node corresponds to a processing unit with a certain storage capacity limit. Taking the underlying storage as bdb as an example, we may allocate a fixed Logical Block (corresponding to the upper-layer chunk) based on the storage space of the physical machine ), at the underlying implementation level, each chunk may correspond to different bdb storage files at the corresponding layer. After the server is started, these storage nodes will be able to provide external services; of course, in actual applications, the chunk processing capability may have certain dynamic expansion characteristics, which should not affect the correspondence between Chunk and its implementation on physical machines. These available chunks can be allocated to different services based on different application scenarios. If the chunk allocation of a physical machine is unreasonable, We can recycle all the chunks and "format" The physical machine to initialize it into a new chunk.
  3. Chunk allocation and collection
    Once a physical server is up, all chunks under its jurisdiction already exist and can provide external services at any time. However, if these chunks are not added to a specific namespace, in fact, it has not actually provided external services. How to allocate chunks? We mentioned namespace above. In fact, namespace is the core abstraction of the entire system. We can use the chunk available in the current system (a chunk should be provided to only one namespace in theory ), based on the capacity and data characteristics configured by namespac, allocate the appropriate chunk to the specified namespace, thus completing the chunk allocation process.
    Recycling process: Once a batch of chunks are released from a namespace, the chunk data is cleared (note that the data is cleared to release its storage space), and the released chunks can be classified into idle chunk areas, in order to continue to be allocated to other namespaces.

Key scenario demonstration
With the concept of namespace and Chunk, the following describes the implementation process of several common massive kV Data Access scenarios in Chunk:

  1. Routing

    When a key = 'A' calculates the chunk2 location, the system searches for the route table to find the physical machine M corresponding to chunk2, finally, the request is forwarded to the physical machine m and entrusted to obtain the actual value. The latter returns the final result to the request end.

  2. Migration (migration involves the following scenarios)
    A) Insufficient capacity migration (Note: Once the Shard is based on chunk, theoretically, there will be only a namespace with insufficient chunk storage capacity, and there is no concept of insufficient physical machine storage capacity, because the storage capacity of the physical machine is pre-allocated, of course, if the chunk is not enough, it may also indirectly make the physical machine insufficient), when the storage capacity of a namespace is insufficient, we may need to add a new chunk to the namespace, so some data needs to be migrated from the old chunk to the new chunk, as shown in:

    B) unbalanced traffic migration. If the traffic volume is not balanced, one case is that the access volume of a node is too large. In this case, you can directly adjust the hash algorithm of the chunk in the client, make the access to all Chunks as evenly as possible. In this case, physical machines are not affected. In another case, a physical machine has a large access volume, several chunks with a large access volume can be migrated to some idle machines. Because the migration occurs between physical machines, the chunk remains unchanged for the client, therefore, the client is completely transparent without any impact.

    C) for migration of failover, the design of failover is based on chunk, That is, assuming that a namespace sets chunk to a master-slave mode, when the master node fails to access the chunk, the chunk is automatically switched to the backup chunk. Therefore, when configuring the chunk Allocation Algorithm for the master and slave databases, the chunk must be distributed across different physical machines.

Logical Structure Diagram:

Summary (advantages of chunk)

  1. Chunk standardizes and abstracts hardware resources to a certain extent, making them a convenient virtual unit for control;
  2. A chunk can effectively divide a large resource into multiple logically independent individuals, so that the entire resource pool does not need to be traversed when accessing the data in these individuals, instead, you only need to find these smaller logical units through the ing relationship and then traverse them. In a sense, it is a bit similar to a file built on a hard disk. as a basic unit of storage, a file is combined with the file management system on it, you can easily find and locate a part of a large disk storage, and make operations such as searching, deleting, moving, and recycling available space more convenient. Imagine that if there is no file as a basic storage management unit, data storage on the hard disk would be a disaster.
  3. The role of files on another layer is to isolate files. As long as you use the same file system, reading and writing files on the upper layer can completely ignore the features of the underlying storage media, and the underlying layer can be hard disks, it can be flash memory, a CD, or anything else. The chunk design has a similar isolation effect to some extent. We can migrate the data of a chunk to another physical machine without worrying about the storage structure of the underlying data. In some kV systems, various storage layers may be used at the underlying layer, because Chunk is a high-level abstract concept. for end users, all they see is logical chunks, in addition, these nodes no longer correspond to specific physical machines, which makes it possible for a physical machine to share resources through multiple chunks, in addition, automatic data migration between heterogeneous storage on multiple physical machines is also possible.
  4. Based on chunk, you can easily implement dynamic data migration.

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.