Key issues in Distributed Storage System Design

Source: Internet
Author: User
Tags openstack swift
Zookeeper this article introduces some key problems and solutions in the distributed system design: Data Distribution Mode replica storage mode replica control node status monitoring
1) Data Distribution Hash DistributionThis method can also be used for table shards in databases in the key-value model storage system. Use the hash function to hash objects to different machines. Advantages: Hash model, uniform distribution, simple implementation. Scalability is not strong. Adding new machines will invalidate all previous mappings.
Distribution by data rangeThis method can also be used for table shards in databases in the key-value model storage system. Data is divided into different intervals and distributed to different machines. The intervals can be dynamically created. Advantage: flexible data splitting rules and strong scalability disadvantages: a large amount of metadata (maintaining the ing between data partitions and machines) makes it easy to become a bottleneck.
Distribution by data volumeA storage system used for file models. In the distribution process, ensure that the data volume on each machine is roughly the same, that is, the load balancing method. Advantage: strong scalability disadvantages: maintain metadata (each file is distributed on that machine)
Consistent hashThe storage system used for the key-value model overcomes the scalability defect of hash distribution and is widely used in various product implementations, such as memcached and openstack Swift.
2) storage of copies Unit: MachineReplicas are in the unit of machines. Several machines are copies of each other, and the data between replicas is identical. Advantages and disadvantages: low data recovery efficiency (fewer recovery sources); load distribution is prone to imbalance.
In data segmentSplit the data into reasonable data segments and store copies in units of data segments. The data segment has the same size and has multiple names: Chunk, partition, and segment. Advantages: fast recovery and scalability. Disadvantages: Metadata maintenance, difficult to implement
3) Copy ControlIt is used to manage consistency between multiple replicas, and is related to the system consistency model.
Centralized replicas control a copy to act as a central node and control other replicas, such as primary-secondary. Update operations must be completed through the central node. This method is relatively simple. Typical examples include the master-slave mode in GFS and MySQL.
Decentralized, with no central control between copies. The implementation of this method is complicated and not many are used, representing the product dynamo.
4) node status monitoring HeartbeatA monitoring node exists, and other nodes regularly send heartbeat information to the monitoring node.
Lease MechanismYou can also use the lease mechanism to monitor the status.
Methods In fastdfs 1) Data DistributionLoad Balancing is performed between different groups based on data volume distribution.
2) storage of copiesCopies are stored in units of machines, forming a group of machines that are copies of each other.
3) Copy ControlTracker selects a storage in the group as the central node (primary). other nodes are slave nodes, and the central node is responsible for synchronizing data to the slave node.
4) node status monitoringStorage regularly sends heatbeat information to tracker. Once the heartbeat information is not received within the specified time, the corresponding node is taken offline.
Refer to the Article introduction to distributed system principles, Liu Jie

Key issues in Distributed Storage System Design

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.