The HDFs cluster operates in Master-slave mode, with two main types of nodes: one Namenode node (that is, master) and multiple Datanode nodes. Namenode manages the namespace of the file system. He maintains metadata for all files and folders in the file system tree and in the file tree .
HDFs Frame Composition:
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/6E/94/wKiom1V_-OPSgEV_AATTZm5yVSc993.jpg "title=" 111. PNG "alt=" wkiom1v_-opsgev_aattzm5yvsc993.jpg "/>
Namenode:
Namenode manages the namespace of the file system. It maintains metadata for all files and folders in the file system tree and in the file tree (Metadata). There are two files that manage this information, namely the Namespace image file (Namespace image) and the Operation log file (edit log). This information is stored in RAM and, of course, these two files are also persisted on the local disk. Namenode records the location information of the data node where each block resides in each file, but it does not persist this information because it is rebuilt from data and nodes when the system restarts.
Namenode Structure Abstract Diagram:
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/6E/94/wKiom1V_-PnxnGs8AAPoPz9H2zo240.jpg "title=" 222. PNG "alt=" wkiom1v_-pnxngs8aapopz9h2zo240.jpg "/>
The client interacts with Namenode and Datanode on behalf of the user to access the entire file system. The client provides a series of file system interfaces, so we can do what we need with little knowledge of datanode and Namenode when we program.
Datanode:
Datanode are the working nodes of the file system, they store and retrieve data based on the dispatch of the client or Namenode, and periodically send the Namenode a list of blocks that they store.
Namenode Fault Tolerant mechanism:
You can't work without a namenode,hdfs. In fact, if the machine running Namenode is broken, the files in the system will be completely lost, because there is no other way to reconstruct the file blocks on different datanode. Therefore, the fault tolerance mechanism of namenode is very important, and Hadoop provides two kinds of fault tolerance mechanisms.
The first way: The file system metadata backup that is stored on the local disk is persisted. Hadoop can be configured to let Namenode write its persisted state in different file systems. This write operation is synchronous and atomized. A more common configuration is to write the persisted state to the local disk, and also to the remote mounted network file system.
The second way: is to run an auxiliary Namenode (secondary Namenode). Secondary Namenode in real time cannot be used as Namenode it's primary role is to periodically namespace the image with the Operation log file (edit LOG) to prevent the operation of the logging file (edit log) from getting too large. Typically, the secondary Namenode runs on a separate physical machine because a backup of the namespace Mirror is merged, which can be used if the Namenode is down. But the auxiliary namenode always lags behind the namenode, so when the namenode goes down, data loss is unavoidable. In this case, in general, to use the Namenode metadata file in the remote mounted Network File System (NFS) mentioned in the first way, put the Namenode metadata file in NFS, Copy to the secondary Namenode and run the auxiliary namenode as a namenode.
This article is from the "David" blog, so be sure to keep this source http://davidbj.blog.51cto.com/4159484/1662449
Namenode and Datanode in HDFs