Research and implementation of distributed storage based on HDFs

Source: Internet
Author: User
Keywords Distributed storage HDFS dynamic replicas
Tags *.h file access based copy data default design distributed

Research and implementation of distributed storage based on HDFs

Weishukang of the

This paper analyzes the structure and operation mechanism of HDFS, points out some design defects, and improves the HDFS's copy strategy, which is mainly as follows:
  
(1) HDFs The default static replica redundancy policy does not distinguish the hotspot data, which causes the node that holds the data to become a bottleneck of the cluster. To solve this problem, a dynamic redundancy strategy based on data heat is proposed in this paper. This strategy counts and predicts access to each file, and the statistical cycle of each file changes with the frequency of its visits, which quickly reflects trends in data heat and increases or decreases the number of copies in a timely manner. This strategy can speed up the response of the system, improve the throughput of the cluster and reduce the operation time.
  
(2) HDFs does not consider the heterogeneity of the datanode, if the poor performance of the node to store more data, then read, processing data, low performance nodes need to bear more load, idle high-performance node processing capacity, uneven load distribution. To solve this problem, a placement strategy based on node performance evaluation and network distance is proposed in this paper. First, an interface is provided to allow the user to customize the node state information and configure its weights, then the improved topsis algorithm is used to evaluate the node, and finally the integrated network distance is chosen to place the node. This policy allows users to set their own attention points, and on this basis to balance the load of each node, improve the overall performance of the system.
  
(3) A large number of simulations and experiments, and based on the improved HDFS cluster to develop a C/s model of cloud storage system, using the HDFS default strategy to compare the improved strategy, the experiment proves that the improved strategy can better improve the performance of the cluster.


Research and implementation of distributed storage based on HDFs

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.