The block with the default base storage unit for HDFs 64mb,hdfs is much larger than the disk block, to reduce the addressing overhead. If the block size is 100MB, addressing time at 10ms, the transfer rate is 100mb/s, then the addressing time is 1% of the transmission time
Three important roles for HDFs: Client,datanode,namenode
Namenode is equivalent to the manager in HDFs, managing the namespace of the file system. It maintains the file system tree and all the files and index directories in the tree. It stores the file system's metadata in memory.
Datdanode is equivalent to a worker in HDFs and is the basic unit of file storage. Periodically report to namenode the list of blocks it stores
The client is the application that obtains the HDFs file, accesses the entire file system by interacting with Namenode, Datanode, and the client provides a file system interface similar to the POSIX (Portable Operating system interface), As a result, users do not need to know Namenode, Datanode and their functions when programming.
(1) File write
- Client initiates a request to Namenode to write a file
- Namenode depending on file size and file block configuration, see the information returned to the client for some of the datanode it manages
- The client divides the file into blocks, which are written sequentially to each datanode according to the Datanode address information
(2) file read
- Client initiates read file request to Namenode
- Namenode returns information about the Datanode that stores the file
- Client Read file
(3) Block replication
- Namenode found that the block of some files does not meet the minimum number of copies or partial datanode failure
- Notify Datanode to duplicate each block
- Datanode start copying each other.
Hadoop Learning---HDFs