Hadoop HDFS (1)

Source: Internet
Author: User
HDFS is a hadoop distributed filesystem, A hadoop distributed file system.
When the data is as big as one machine and cannot be stored, it should be distributed to multiple machines. The file system that manages the storage space on multiple computers through the network is called a distributed file system. The complexity of network programs makes distributed file systems much more complex than ordinary disk file systems. For example, one of the biggest challenges is fault tolerance. After one or more nodes die, data integrity can still be ensured.
HDFS is the flagship File System of hadoop, but hadoop also has an abstract file system for integrating other file systems, such as local storage.
HDFS is a file system designed for storing very large files. It is suitable for streaming data access and runs in common commercial computer clusters.
"Very large file": refers to several hundred MB, several GB or even several TB of data. Yahoo used to store Pb of data on 4000 nodes. "Streaming data access": The index data is written once and then read and analyzed and computed multiple times. In addition, most of the data in the file is used for each calculation, rather than the first row of data. "Common Commercial computer cluster": hadoop is designed to run on common computers instead of running on very expensive and highly reliable servers. This means that the node is unreliable and it is normal for the node to exit the cluster due to a fault. HDFS is designed to prevent users from being clearly aware of node faults.
HDFS also has some unsuitable scenarios. "Instant data access": HDFS is not suitable for instant systems. HDFS is optimized for high throughput and has an advantage in processing a large amount of data. However, HDFS is not suitable for processing a small portion of data and then responding quickly to the returned system. "A large number of small files": The file metadata is stored in the namenode memory. The metadata occupies about 1 million MB of memory on the namenode machine. Therefore, even though millions of files can afford it, billions of files cannot afford the most popular hardware configurations. "A large number of write operations and file intermediate modifications": HDFS currently only supports single-threaded write operations and can only be written at the end of the file. You cannot modify any location in a file. (It may be supported in later versions, but even if it is supported, it will be inefficient .)

Hadoop HDFS (1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.