Within Hadoop, there are many types of file systems implemented, and of course the most used is his distributed file system, HDFs. However, this article does not talk about the master-slave architecture of HDFS, because these things are much more spoken on the internet and in the information books. So, I decided to take my personal learning, to say something interesting inside the HDFs, but also as a starting point for the follow-up to continue in-depth module learning to do the basis.
Two main relational modules of HDFs
Related to Namenode, file system metadata operations. Includes the file directory tree, each file for a list of data blocks, fsimgae image files, and Editlog edit logs for various operations when maintaining the entire cluster metadata.
Related to Datanode, refers to the data block and data node correspondence, the popular understanding is that a block of data on which data node. In fact, there are many operations involved, including block copy replication, damage block deletion, lease mechanism and so on.
The class inode involved in the first relational module (Namenode related)
Hadoop also uses the concept of I-node, the index node, similar to Linux file systems. The Inode is an abstract class, followed by Inodedirectory and Inodefile are his subclasses, so you can maintain some common attributes.
- Common methods:
- Inodedirectory.removechild ()
- namespace mirroring class, the following are common methods
- Fsimage.savefsimage () – tells the current moment of the namespace image, save the file
- Fsimage.loadfsimage () – Read the data in the image file and restore the metadata
- Edit the Log class, the following are common methods
- Fseditlog.logedit () – writes a log record operation.
- Fseditlog.logsync () – Synchronizes the logging operation.
- Fseditlog.rolleditlog () – Used to upload a new namespace mirror for the second-name node.
- HDFs introduces Fsdirectory as a façade, handles various operations, and then assigns them to individual objects in the subsystem. Common methods
- GetFileInfo () – Get File status information
- SetOwner () – Modify the file Master identifier and user group identifier
Second Relationship module (Datanode related) block data block related classes
- blocksmap– Data block mapping, metadata for data blocks on a moniker node
- datanodedescriptor– Data node descriptor, name node abstraction of the data node
- Blockinfo-blocksmap, saving information for the data node
Data node Management
- Refreshnodes () – The information for the Dfs.hosts.exclude,include configuration is read.
- registerdatanode– Data Node Registration
- Datanode.offerservice () – The data node uses loops to send information to the name node.
Related class Networktopology, common methods:
- Getdistance () – Calculate network distance
- Isonsamerack () – Determine if the node belongs to the same rack
dnstoswitchmapping– this interface for host-to-network location conversions
Data Block Management
- Related classes are mainly Fsnamesystem, common methods:
- Fsnamesystem.addstoreblock () – Add a copy of the data block
- Blockreceived () – The data block submission method, after the data node successfully receives a block of data, must use this method to submit block information to the name node.
Read Data method
- Getblocklocations () – The location where the data needs to be located before reading the data, returning the Locatedblock object instance
- Reportbadblocks ()
- In a nutshell, a lease is a name node that gives the lease holder permission to use the file for a specified period of time.
- The information of the Leasemanager.lease lease holder is the client.
- Leasemanager.add () – Adds information about the open file in the lease manager.
- Fsnamesystem.checklease () – Lease check operation, open file for append data, add, discard data block or close file, need to check the file of the operation.
- Leasemanager.renewlease () – maintains the lease by updating the value of the lease.lastupdate.
- leasemanager.monitor– this is an internal class, and the Monitor class implements the Runnable interface, which periodically checks for leases.
- SafeMode Safe Mode is a read-only view mode for HDFs, and all update operations are checked for safety mode, related classes, and methods
- The safemodemonitor– security mode checks the thread implementation class.
- Canleave () – Determine the value of the internal reached variable
- Setsafemode ()
"Internal –HDFS structure design and implementation principles of Hadoop technology". Cai Bin, etc.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
--HDFS structure Analysis of Hadoop Distributed File system