HDFS is the foundation in the entire hadoop framework. HDFS provides massive unstructured data storage and provides APIs for file creation, deletion, reading, and writing, developers only need to operate on the tree structure composed of a directory.
At the beginning of its design, HDFS has considered the following aspects:
1. HDFS uses a large number of low-cost PCs with poor stability as file storage devices. Therefore, the probability of PC crashes or hard disk failures is extremely high, which should be regarded as the norm, therefore, HDFS should provide multiple data backups to automatically detect node survival and Automatic Repair of faulty machines.
2. HDFS stores mostly large files, so the read and write operations on large files will be optimized.
3. For data writing, the system has many append operations, but few random reads/writes.
4. For Data Reading, most operations are sequential reads, with few random reads.