Hadoop Analysis II metadata backup solution mechanism

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. namenode start metadata loading Scenario Analysis

The namenode function calls fsnamesystemm to read DFS. namenode. Name. dir and DFS. namenode. edits. dir to build fsdirectory.
Fsimage recovertransitionread and savenamespace implement metadata check, loading, memory merging, and persistent storage of metadata.
Savenamespace writes metadata to the disk. Procedure: First rename the current directory to lastcheckpoint. TMP; then, create a new current directory and save the file. rename TMP to privios. checkpoint.
Checkpoint Process: Secondary namenode notifies namenode to generate an edit Log File edits. New, and then all log operations are written to the edits. New file. Next, secondary namenode downloads the fsimage and edits files from namenode and merges them to generate a new fsimage. ckpt. Then secondary uploads the fsimage. ckpt file to namenode. Finally, namenode will rename fsimage. ckpt to fsimage, and edtis. New to edits;

2. metadata update and log writing scenario analysis taking mkdir as an example: logsync code analysis: code:

public void logSync () throws IOException {ArrayList<EditLogOutputStream > errorStreams = null ;long syncStart = 0;// Fetch the transactionId of this thread.long mytxid = myTransactionId .get (). txid;EditLogOutputStream streams[] = null;boolean sync = false;try {synchronized (this) {assert editStreams. size() > 0 : "no editlog streams" ;printStatistics (false);// if somebody is already syncing, then waitwhile (mytxid > synctxid && isSyncRunning) {try {wait (1000 );} catch (InterruptedException ie ) {}}//// If this transaction was already flushed, then nothing to do//if (mytxid <= synctxid ) {numTransactionsBatchedInSync ++;if (metrics != null) // Metrics is non-null only when used inside name nodemetrics .transactionsBatchedInSync .inc ();return;}// now, this thread will do the syncsyncStart = txid ;isSyncRunning = true;sync = true;// swap buffersfor( EditLogOutputStream eStream : editStreams ) {eStream .setReadyToFlush ();}streams =editStreams .toArray (new EditLogOutputStream[editStreams. size()]) ;}// do the synclong start = FSNamesystem.now();for (int idx = 0; idx < streams. length; idx++ ) {EditLogOutputStream eStream = streams [idx ];try {eStream .flush ();} catch (IOException ie ) {FSNamesystem .LOG .error ("Unable to sync edit log." , ie );//// remember the streams that encountered an error.//if (errorStreams == null) {errorStreams = new ArrayList <EditLogOutputStream >( 1) ;}errorStreams .add (eStream );}}long elapsed = FSNamesystem.now() - start ;processIOError (errorStreams , true);if (metrics != null) // Metrics non-null only when used inside name nodemetrics .syncs .inc (elapsed );} finally {synchronized (this) {synctxid = syncStart ;if (sync ) {isSyncRunning = false;}this.notifyAll ();}}}

3. Process Analysis of the checkpoint of the backup node:

/*** Create a new checkpoint*/void doCheckpoint() throws IOException {long startTime = FSNamesystem.now ();NamenodeCommand cmd =getNamenode().startCheckpoint( backupNode. getRegistration());CheckpointCommand cpCmd = null;switch( cmd. getAction()) {case NamenodeProtocol .ACT_SHUTDOWN :shutdown() ;throw new IOException ("Name-node " + backupNode .nnRpcAddress+ " requested shutdown.");case NamenodeProtocol .ACT_CHECKPOINT :cpCmd = (CheckpointCommand )cmd ;break;default:throw new IOException ("Unsupported NamenodeCommand: "+cmd.getAction()) ;}CheckpointSignature sig = cpCmd. getSignature();assert FSConstants.LAYOUT_VERSION == sig .getLayoutVersion () :"Signature should have current layout version. Expected: "+ FSConstants.LAYOUT_VERSION + " actual " + sig. getLayoutVersion();assert !backupNode .isRole (NamenodeRole .CHECKPOINT ) ||cpCmd. isImageObsolete() : "checkpoint node should always download image.";backupNode. setCheckpointState(CheckpointStates .UPLOAD_START );if( cpCmd. isImageObsolete()) {// First reset storage on disk and memory statebackupNode. resetNamespace();downloadCheckpoint(sig);}BackupStorage bnImage = getFSImage() ;bnImage. loadCheckpoint(sig);sig.validateStorageInfo( bnImage) ;bnImage. saveCheckpoint();if( cpCmd. needToReturnImage())uploadCheckpoint(sig);getNamenode() .endCheckpoint (backupNode .getRegistration (), sig );bnImage. convergeJournalSpool();backupNode. setRegistration(); // keep registration up to dateif( backupNode. isRole( NamenodeRole.CHECKPOINT ))getFSImage() .getEditLog (). close() ;LOG. info( "Checkpoint completed in "+ (FSNamesystem .now() - startTime )/ 1000 + " seconds."+ " New Image Size: " + bnImage .getFsImageName (). length()) ;}}

4. Metadata reliability mechanism.

Configure multiple backup paths. When namenode updates logs or performs checkpoints, it stores metadata in multiple directories.
If no metadata file needs to be saved, an output stream is created to process the abnormal output stream during access and remove it. At the right time, check again whether the removed data volume has recovered. This effectively ensures the exception of the backup output stream.
Multiple mechanisms are used to ensure the reliability of metadata. For example, in the checkpoint process, there are several stages in which different file names are used to identify the current status. This provides the possibility of restoring a storage failure.

5. Metadata consistency mechanism.

When starting from namenode, check whether each Backup Directory is formatted and whether the directory metadata file name is correct to ensure the State consistency among the metadata files, and then select the latest load to the memory, this ensures that the current status of HDFS is consistent with that of the last shutdown.
Second, the handling of abnormal output streams ensures data consistency of normal output streams.
The synchronization mechanism is used to ensure the consistency of the output stream.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop Analysis II metadata backup solution mechanism

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop Analysis II metadata backup solution mechanism

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support