namenode function call Fsnamesystemm read dfs.http://www.aliyun.com/zixun/aggregation/11696.html "> Namenode.name.dir and Dfs.namenode.edits.dir build Fsdirectory. Fsimage class Recovertransitionread and Savenamespace implement metadata checking, loading, memory merging and persistent storage of metadata respectively. Savenamespace writes metadata to disk: First rename the current directory to lastcheckpoint.tmp, and then create a new and save file Finally rename the lastcheckpoint.tmp to Privios.checkpoint.checkPoint: Secondary Namenode will notify Namenode to produce an edit Log file Edits.new, after which all the logging operations are written to the Edits.new file. Next secondary Namenode downloads fsimage and edits files from Namenode, merges to generate a new fsimage.ckpt, and then secondary uploads fsimage.ckpt files to Namenode. Finally Namenode will rename fsimage.ckpt as fsimage,edtis.new edits;
2, meta-data update and log write scenario analysis
Take mkdir as an example: Logsync Code Analysis: Code: public void Logsync () throws IOException {arraylist<editlogoutputstream > Errorstreams = null; Long Syncstart = 0;//Fetch The transactionid of this thread.long Mytxid = Mytransactionid. Get (). Txid; Editlogoutputstream streams[] = Null;boolean sync = false;try {synchronized (this) {assert editstreams. Size () > 0: "No Editlog streams ";p rintstatistics (false);/If somebody is already syncing, then Waitwhile (Mytxid > Synctxid && issyncrunning) {try {wait (1000);} catch (Interruptedexception IE) {}}////If This transaction is already flushed, then Nothing to do//if (Mytxid <= synctxid) {numtransactionsbatchedinsync ++;if (metrics null)//!= be metrics only When used inside name Nodemetrics. Transactionsbatchedinsync. Inc (); Now, this thread would do the Syncsyncstart = Txid issyncrunning = True;sync = true;//Swap buffersfor (editlogoutputstream Estream:editstreams) {eSTREAM. Setreadytoflush ();} Streams =editsTreams. ToArray (New editlogoutputstream[editstreams. Size ());} Do the Synclong start = Fsnamesystem.now (), for (int idx = 0; idx < streams. length; idx++) {Editlogoutputstream = streams [idx];try {estream. Flush ();} catch (IOException IE) {fsnamesystem. Log. Error ("Unable to sync edit log.", IE);////Remember the streams this encountered an error.//if (Errorstreams = null) { Errorstreams = new ArrayList <editlogoutputstream > (1);} Errorstreams. Add (eSTREAM);} LONG elapsed = Fsnamesystem.now ()-Start;p rocessioerror (Errorstreams, true); if (metrics!= null)//metrics Non-null only When used inside name Nodemetrics. Syncs. Inc (elapsed); Finally {synchronized (this) {Synctxid = Syncstart; if (sync) {issyncrunning = false;} This.notifyall ();}}
3, Backup Node checkpoint process Analysis:/*** Create a new checkpoint*/void Docheckpoint () throws IOException {Long starttime = Fsnamesystem.now (); Namenodecommand cmd =getnamenode (). Startcheckpoint (Backupnode. getregistration ()); Checkpointcommand cpcmd = Null;switch (cmd. getaction ()) {case Namenodeprotocol. Act_shutdown:shutdown (); throw new IOException ("Name-node" + Backupnode. nnrpcaddress+ "requested SHUTDOWN."); Case Namenodeprotocol. Act_checkpoint:cpcmd = (checkpointcommand) cmd; break;default:throw new IOException ("Unsupported NamenodeCommand:" + Cmd.getaction ()); Checkpointsignature sig = Cpcmd. Getsignature (); assert fsconstants.layout_version = = sig. Getlayoutversion (): "Signature should have current LAYOUT Version. Expected: "+ Fsconstants.layout_version + actual" + sig. Getlayoutversion (); Assert!backupnode. Isrole (Namenoderole. CHECKPOINT) | | Cpcmd. Isimageobsolete (): "Checkpoint node should synch download image." Backupnode. Setcheckpointstate (checkpointstates. UploaD_start); if (Cpcmd. Isimageobsolete ()) {//I reset storage on disk and memory Statebackupnode. Resetnamespace (); Downloadcheckpoint (SIG); Backupstorage bnimage = Getfsimage (); bnimage. Loadcheckpoint (SIG); Sig.validatestorageinfo (bnimage); bnimage. Savecheckpoint (); if (Cpcmd. Needtoreturnimage ()) Uploadcheckpoint (SIG); Getnamenode (). Endcheckpoint (BackupNode. Getregistration (), sig); Bnimage. Convergejournalspool (); Backupnode. Setregistration (); Keep registration up to Dateif (Backupnode. Isrole (Namenoderole.checkpoint)) Getfsimage (). Geteditlog (). Close (); LOG. Info ("Checkpoint completed in" + (Fsnamesystem. Now)-StartTime)/1000 + "seconds." + "New Image Size:" + bnimage. Getfsimagename (). Length ());}
4, metadata reliability mechanism. Configure multiple backup paths. Namenode the metadata is placed in multiple directories when the log is updated or the checkpoint process is performed. For none of the metadata files that need to be saved, an output stream is created to process the exception output stream that appears during the access process, and remove it. Again, check to see if the amount of data removed is back to normal. It effectively guarantees the abnormal problem of the backup output stream. A variety of mechanisms are used to ensure the reliability of metadata. For example, in the process of checkpoint, there are several stages that identify the current state by different file names. Provides a possibility for recovery after a storage failure. 5, the consistency mechanism of the metadata. When starting from Namenode, check that each backup directory is formatted, the directory metadata file name is correct, and so on, ensure state consistency between metadata files, and then select the most recent load into memory to ensure state consistency between the HDFs current state and the last shutdown. Secondly, through the processing of abnormal output stream, the consistency of normal output stream data can be ensured. The synchronization mechanism is used to ensure the consistency of output flow.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.