Chapter 2 hadoop Distributed File System
Hadoop distributed filesystem
Store ultra-large files in Streaming Data Access Mode
The idea of building hadoop: the most efficient access mode for one write and multiple reads. the time delay for reading the entire dataset is more important than the time delay for reading the first record.
Currently, write operations always add data to the end of a file. They do not support operations with multiple writers or modify the data at any location of the file. They are relatively inefficient and may support these operations in the future.
The disk block size is generally 512 bytes, and the default HDFS block size is 64 MB.
HDFS blocks are larger than disk blocks, so as to minimize addressing overhead and reduce the proportion of addressing time to transmission time. As the disk drive transmission rate increases, the block size will be larger; however, due to the limited processing speed of map tasks, the block cannot be set too large. Otherwise, the number of tasks is too small, and the job running speed is slow.
The fsck command in HDFS can display block information, % hadoop fsck/-files-Blocks
HDFS clusters have two types of nodes-Manager-worker mode-one namenode (manager) and multiple datanode (worker)
Namenode manages the file system namespace and maintains all the files and directories in the file system tree and the entire tree. The namespace image file is permanently saved as two files, and the editing log file records the data node information of each block in each file, but does not permanently Save the block location information, this information will be rebuilt by the data node when the system is started.
The client interacts with namenode and datanode by providing a file system interface.
Datanode is the working node of the file system. It stores and retrieves data blocks as needed, and regularly sends a list of blocks they store to namenode.
Namenode Fault Tolerance Mechanism-1. Back Up Files that make up the persistent state of File System metadata; 2. Run a secondary namenode