Ramble about the future of HDFs

Source: Internet
Author: User

The HDFs we mentioned earlier understands the features and architecture of HDFS. HDFs can store terabytes or even petabytes of data is a prerequisite, first of all the data to large file-based, followed by namenode memory is large enough. Some of the students who know about HDFs know that Namenode is an HDFS that stores metadata information for the entire cluster, such as all file and directory information, and so on. And when the metadata information is more, the startup of Namenode becomes very slow, and it is easier to trigger the GC operation. Obviously when the data to a certain magnitude, metadata management will become a bottleneck in HDFs, in fact, this is why it is suitable for storing large files. If you solve the problem of metadata management, in fact, HDFs can support a large amount of small files.

Finally the play of this article: Ozone,ozone is an Hortonworks-based object storage service, designed for HDFs-based Datanode storage that supports larger data object storage, supports a variety of object sizes, and has hdfs reliability. Consistency and availability, see Hadoop's Jira HDFS-7240 for details. After such a long period of development and intense name discussions will eventually be named HDDs (Hadoop distributed Data Store) See Jira HDFS-10419.

So how does ozone solve the existing problem of HDFs?

The main thrust of ozone is scaling HDFs (scaling HDFs). Scaling HDFs is the current problem with HDFs: Namenode metadata management bottleneck to deal with, on the one hand to reduce the pressure of namenode, on the other hand to abstract another layer of mapping to ensure fast read and write data.

The current tiers of HDFs are as follows:

    1. A namespace layer (namespace level) implemented in the Namenode service
    2. A block layer (block layers) is primarily implemented in the Datanode service, and Namenode also provides a block management service.

The design of ozone is for the current layer of HDFS to scale the relevant functional modules.

Namespace Layer:

    1. Scaling NameSpace (Scaling namespaces)
    2. Scaling Client/rpc load on NN (scaling Namenode support Request)
    3. NN startup time (Shortens Namenode start-up times)

Block Layer:

    1. Scaling block Namespace (the namespace for scaling block blocks)
    2. Scaling Block Reports (scaling block block to nn report request)
    3. Scaling Datanode ' s block Management (scaling block block management layer)

To solve the existing problem of HDFS, we need to optimize HDFS from the above two dimensions, and briefly describe how to implement the scaling of namespaces and block blocks in the design paper, such as reference to Ceph's distributed namespaces. Or for data that is frequently manipulated into memory workingset, other data is persisted, and so on. At the same time, the abstraction of a block group layer of about 2g~16g is called container, which solves the scaling problem of block blocks, where we can complement Ceph's pg.

Ozone finally implemented two services to achieve the above solution: KSM (Key Space Manager) and SCM (Storage Container Manager)

KSM: The Ozone namespace is responsible for managing. All Volume,bucket and key record information is stored in KSM. This role is similar to the namenode of HDFs.

SCM: is responsible for managing the "Container" object, and Container is logically storing a collection of block blocks objects. Datanode is the ability to provide storage in the form of container. SCM is only responsible for maintaining these container information. The original block report becomes container report.

At the same time Ozone also implemented a set of file system interface, Ozone FS, it is fully compatible with the existing HDFS read-write mode, support spark,hive and other programs. can support convenient transfer of data from old HDFs to ozone.

And the more perfect HDFs we're looking for in the end should be like this.

Resources:
Talk about the convergence of HDFs and ozone.
Hdfs+scalability-v2

Welcome to follow me: three King data (unstable continuous update ~ ~ ~)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.