Ramble about the future of HDFs

Last Update:2018-08-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The HDFs we mentioned earlier understands the features and architecture of HDFS. HDFs can store terabytes or even petabytes of data is a prerequisite, first of all the data to large file-based, followed by namenode memory is large enough. Some of the students who know about HDFs know that Namenode is an HDFS that stores metadata information for the entire cluster, such as all file and directory information, and so on. And when the metadata information is more, the startup of Namenode becomes very slow, and it is easier to trigger the GC operation. Obviously when the data to a certain magnitude, metadata management will become a bottleneck in HDFs, in fact, this is why it is suitable for storing large files. If you solve the problem of metadata management, in fact, HDFs can support a large amount of small files.

Finally the play of this article: Ozone,ozone is an Hortonworks-based object storage service, designed for HDFs-based Datanode storage that supports larger data object storage, supports a variety of object sizes, and has hdfs reliability. Consistency and availability, see Hadoop's Jira HDFS-7240 for details. After such a long period of development and intense name discussions will eventually be named HDDs (Hadoop distributed Data Store) See Jira HDFS-10419.

So how does ozone solve the existing problem of HDFs?

The main thrust of ozone is scaling HDFs (scaling HDFs). Scaling HDFs is the current problem with HDFs: Namenode metadata management bottleneck to deal with, on the one hand to reduce the pressure of namenode, on the other hand to abstract another layer of mapping to ensure fast read and write data.

The current tiers of HDFs are as follows:

A namespace layer (namespace level) implemented in the Namenode service
A block layer (block layers) is primarily implemented in the Datanode service, and Namenode also provides a block management service.

The design of ozone is for the current layer of HDFS to scale the relevant functional modules.

Namespace Layer:

Scaling NameSpace (Scaling namespaces)
Scaling Client/rpc load on NN (scaling Namenode support Request)
NN startup time (Shortens Namenode start-up times)

Block Layer:

Scaling block Namespace (the namespace for scaling block blocks)
Scaling Block Reports (scaling block block to nn report request)
Scaling Datanode ' s block Management (scaling block block management layer)

To solve the existing problem of HDFS, we need to optimize HDFS from the above two dimensions, and briefly describe how to implement the scaling of namespaces and block blocks in the design paper, such as reference to Ceph's distributed namespaces. Or for data that is frequently manipulated into memory workingset, other data is persisted, and so on. At the same time, the abstraction of a block group layer of about 2g~16g is called container, which solves the scaling problem of block blocks, where we can complement Ceph's pg.

Ozone finally implemented two services to achieve the above solution: KSM (Key Space Manager) and SCM (Storage Container Manager)

KSM: The Ozone namespace is responsible for managing. All Volume,bucket and key record information is stored in KSM. This role is similar to the namenode of HDFs.

SCM: is responsible for managing the "Container" object, and Container is logically storing a collection of block blocks objects. Datanode is the ability to provide storage in the form of container. SCM is only responsible for maintaining these container information. The original block report becomes container report.

At the same time Ozone also implemented a set of file system interface, Ozone FS, it is fully compatible with the existing HDFS read-write mode, support spark,hive and other programs. can support convenient transfer of data from old HDFs to ozone.

And the more perfect HDFs we're looking for in the end should be like this.

Resources:
Talk about the convergence of HDFs and ozone.
Hdfs+scalability-v2

Welcome to follow me: three King data (unstable continuous update ~ ~ ~)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Ramble about the future of HDFs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Ramble about the future of HDFs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support