International - English

Cart Console

Topic Center

Contact Sales

Home > Internet > Online Trends

Hadoop White Paper (1): Introduction to Distributed File System HDFs

Last Update:2015-03-17 Source: Internet

Author: User

Keywords Hadoop

Tags access array block customers data data storage development different

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Hadoop Distributed File System (HDFS) is a distributed file system running on universal hardware. HDFS provides a massive data storage solution with high tolerance and throughput. HDFS has been widely used in a wide range of large online services and large storage systems, and has become a mass-storage fact standard for online service companies such as major web sites, providing reliable and efficient services to website customers over the years.

With the rapid development of information system, a large amount of information needs to be stored reliably while it can be accessed quickly by a lot of users. Traditional storage scheme has become more and more difficult to adapt to the rapid development of information system business in recent years, and it becomes the bottleneck and obstacle of business development.

HDFS through an efficient distributed algorithm, the access and storage of data distributed in a large number of servers, reliable and multiple backup storage while also distributed access to the various servers in the cluster, is a traditional storage architecture of a subversive development. HDFS can provide the following features:

• Self-healing Distributed File Storage System

• Highly scalable, dynamic expansion without downtime

• High reliability, data detection and replication

• High throughput access, eliminating access bottlenecks

• Build with low cost storage and server

HDFS characteristics of Distributed File system

High throughput access

Each block of data in the HDFS is distributed over a set of servers in different racks, and when the user accesses it, HDFS calculates the user's access by using the most recent and least visited server. Because each copy of the data block is available to the user, rather than read from a single data source, HDFS access to a single block of data will be several times the traditional storage scenario.

For a larger file, HDFS stores different parts of the file on different servers. When accessing large files, the system can read in parallel from multiple servers in the server array, increasing the access bandwidth of large file reads.

Through the above implementation, HDFS through a distributed computing algorithm, the data access is split into the server array of multiple data copies of each server, a single hard disk or server throughput limit can be several times or even hundreds of times times the breakthrough, providing a very high data throughput.

Seamless capacity expansion

HDFS the file's data block allocation information on the Namenode server, the file data block information distributed in the DataNode server. When the whole system capacity needs to expand, only need to increase the number of Datanode, the system will automatically match the new server into the whole array. After that, the file distribution algorithm will move the data block into the new Datanode, without any system downtime maintenance or manual intervention. Through the above implementation, HDFS can do without stopping the service in real time to join the new server as a distributed file system capacity upgrades, do not need manual intervention file redistribution.

Highly fault tolerant

The HDFS file system assumes that system failures (servers, networks, storage failures, etc.) are normal, not exceptions. Therefore, the reliability of data is ensured by many aspects. The data is copied multiple copies at the time of writing and can be distributed to different servers in the physical location through user-defined replication policies, and data will be automatically validated when read and written; The HDFS system automatically continuously detects data consistency in the background and maintains the number of copies of the data at the specified replication level.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

introduction to linux file system apache distributed file system java distributed file system introduction to programming 1 apache spark white paper java white paper cloud computing white paper

Getting Started with CDN 12-02

Front-end Must Learn: CDN Acceleration Principle 12-02

Elements of CDN Network 12-01

Understand the Principle of CDN Acceleration in One Article 12-01

Cloud Security Issues Derived from the Development of Cloud C... 11-26

8 New Types of Attacks Facing the Cloud Environment 11-26

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Hot Article

Hot Tags

computing conference access forum computer class data get http html applications

Popular Keywords

direct digital landing development documentation data user director of marketing deploy it ddos how to description of products and services ddos information data website domain to dns

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop White Paper (1): Introduction to Distributed File System HDFs

Contact Us

Hot Article

Hot Tags

Popular Keywords

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support