Editlog Source code Analysis Gets the edit log input stream

Source: Internet
Author: User
Tags first string

In the article "HDFs Source code Analysis Editlogtailer", we have detailed understanding of the implementation of the editing log tracker Editlogtailer, introduced its internal editing log tracking thread Editlogtailerthread implementation, The Dotailedits () method that executes the log trace is the most important method that the thread is relying on to complete the edit log trace. In the process of this method, we first need to get the edit log input stream collection streams from the edit log editlog, and get the data of the input stream after the latest transaction ID plus 1. So how does this edit log input stream collection streams get? In this article we will conduct a detailed study.

In the Dotailedits () method, get the code for the edit log input stream as follows:

From the edit log editlog, get the edit log input stream collection streams, get the input stream for the latest transaction ID plus 1 after the data  streams = editlog.selectinputstreams (Lasttxnid + 1, 0, NULL, FALSE);  
This editlog is a file system editing log Fseditlog instance, we look at its selectinputstreams () method, the code is as follows:

  /** * Select A list of input streams. * * @param fromtxid First transaction in the selected streams * @param toatleasttxid The selected streams must Contai   n This transaction * @param recovery Recovery context * @param Inprogressok set to True if in-progress streams is OK */Public collection<editloginputstream> selectinputstreams (long fromtxid, long Toatleasttxid, MetaRecovery Context recovery, Boolean Inprogressok) throws IOException {//Create edit log input stream Editloginputstream list streams List<editlo        ginputstream> streams = new arraylist<editloginputstream> (); Synchronize synchronized (Journalsetlock) {//Detect Journalset status with synchronized on object Journalsetlock Preconditi            Ons.checkstate (Journalset.isopen (), "cannot" + "selectinputstreams () on closed Fseditlog"); Call the Selectinputstreams () method of the three parameter, pass in the empty streams list, start with the FROMTXID transaction ID,//Edit log synchronization, flag bit Inprogressok to False selectinputstre AMS (streams, FROMTXID, InProgRessok);    } try {//Data monitoring checkforgaps (streams, Fromtxid, Toatleasttxid, Inprogressok); } catch (IOException e) {if (recovery! = NULL) {//If recovery mode is enabled, continue loading even if we        Know we//can ' t load up to TOATLEASTTXID.      Log.error (e);        } else {closeallstreams (streams);      Throw e;  }} return streams; }
It first creates the edit log input stream Editloginputstream list streams, and synchronizes with synchronized on object Object Journalsetlock, detects Journalset status, Then call the Selectinputstreams () method of the three parameter, pass in the empty streams list, start with the FROMTXID transaction ID, edit whether the log can be in the flag bit during processing Inprogressok, edit the log synchronization, The flag bit Inprogressok is false, and finally, the Checkforgaps () method is called to monitor the relevant data.

Let's continue to look at the Selectinputstreams () method of the three parameters, the code is as follows:

  @Override public  void Selectinputstreams (collection<editloginputstream> streams,      long Fromtxid, Boolean Inprogressok) throws IOException {  //call Journalset with the same name method    Journalset.selectinputstreams (streams, Fromtxid, Inprogressok);  }
It is actually called Journalset's method with the same name. What is Journalset? It is the manager of the journal Collection, and journal is the meaning of the log, which is the organization of Editlog on Journalnode in Hadoop ha. Let's take a look at Journalset's Selectinputstreams () method, the code is as follows:

  /** * In the This function, we get a bunch of streams from all of our Journalmanager * objects.   Then we add these to the collection one by one. * Within this method, we get a bunch of input streams from all Journalmanager objects.   Then we add them one after another to the collection.  * * @param streams the collection to add the streams to.   The It may or * May isn't sorted--is the caller. * @param fromtxid The transaction ID to start looking for streams at * @param Inprogressok should we conside   R unfinalized streams? */@Override public void Selectinputstreams (collection<editloginputstream> streams, Long Fromtxid, Boolean in Progressok) throws IOException {final priorityqueue<editloginputstream> allstreams = new PRIORITYQUEUE&L T        Editloginputstream> (Edit_log_input_stream_comparator); Traverse journals, get each Journalandstream instance Jas for (Journalandstream jas:journals) {//If JAS is not available, log logs, skip if (Jas.isdisabled ()) {Log.info ("Skipping Jas" + Jas + "since it ' s disabled");      Continue }//Use JAS to get the Journalmanager instance, then call its Selectinputstreams () method, get the input stream, and put it into the Allstreams input stream collection try {Jas.getmana      Ger (). Selectinputstreams (Allstreams, Fromtxid, Inprogressok); } catch (IOException IoE) {Log.warn ("Unable to determine input streams from" + jas.getmanager () + ".      Skipping. ", IoE);  }}//filter out useless edit log input stream Chainandmakeredundantstreams (streams, allstreams, FROMTXID); }
The general process of this method is as follows:

1, Traverse Journalset in the journals, get each Journalandstream instance Jas:

1.1, if Jas is not available, record the log, skip;

1.2, using JAS to get Journalmanager instance, then call its Selectinputstreams () method, get input stream, and put into allstreams input stream set;

2, call the Chainandmakeredundantstreams () method, filter out the useless edit log input stream.
First, let's take a look at the Selectinputstreams () method of the Journalmanager instance, with Filejournalmanager as an example, with the following code:

  @Override  synchronized public void Selectinputstreams (      collection<editloginputstream> streams, long Fromtxid,      Boolean Inprogressok) throws IOException {  //First call Storagedirectory Getcurrentdir () method to get its current directory,/ /And then call the Matcheditlogs () method to get the edit log file editlogfile list elfs    list<editlogfile> elfs = Matcheditlogs ( Sd.getcurrentdir ());    Log.debug (this + ": selecting input streams starting at" + Fromtxid +         (inprogressok? ") (inProgress OK) ":" (excluding inProgress) ") +        " from among "+ elfs.size () +" candidate file (s) ");        Call the Addstreamstocollectionfromfiles () method to get the input stream based on the Edit log file list Elfs and add it to the input stream list streams    Addstreamstocollectionfromfiles (Elfs, streams, Fromtxid, Inprogressok);  }
The process for this method is as follows:

1, first call Storagedirectory's Getcurrentdir () method to obtain its current directory, and then call the Matcheditlogs () method, get the Edit log file editlogfile list elfs;

2. Call the Addstreamstocollectionfromfiles () method to get the input stream based on the Edit log file list Elfs and add it to the input stream list streams.

Let's take a look at the Matcheditlogs () method with the following code:

  /** * Returns matching edit logs via the log directory.          Simple Helper function * This lists the files in the LogDir and calls Matcheditlogs (file[]) * * @param logdir * Directory to match edit logs in * @return matched edit Logs * @throws IOException * IOException th Rown for Invalid LogDir */public static list<editlogfile> Matcheditlogs (File logdir) throws IOException {RET  Urn Matcheditlogs (fileutil.listfiles (LogDir));   } static list<editlogfile> Matcheditlogs (file[] filesinstorage) {return matcheditlogs (Filesinstorage, false); } private static List<editlogfile> Matcheditlogs (file[] Filesinstorage, Boolean forpurging) {//Create edit log file ed        Itlogfile list ret list<editlogfile> ret = lists.newarraylist ();            Traverse Filesinstorage, process each file for the for (file f:filesinstorage) {//Get the file name String name = F.getname (); Check for edits//based on the file name, use regular expressions to detect if it is an edit log edits log//regular expression is editS_ (\d+)-(\d+)//original Edits_ (\\d+)-(\\d+) The first of the two \ in front of D is just an escape character//file name similar to the following: edits_0000000000001833048-0000000000001833081            The first string number is the starting transaction ID, the second string is the terminating transaction id Matcher editsmatch = edits_regex.matcher (name);           if (Editsmatch.matches ()) {///filename matches a regular expression, it means that we need to look for the edit log file try {//Get start Transaction ID: The first match in the regular expression is the starting transaction ID                    Long Starttxid = Long.parselong (Editsmatch.group (1));                    Get termination Transaction ID: The second match in the regular expression is the terminating transaction id long Endtxid = Long.parselong (Editsmatch.group (2)); Use the file F, start transaction add Starttxid, terminate transaction add ENDTXID constructs the edit log file Editlogfile instance,//and adds to the RET list ret.add (new Editlogfile (F, St          Arttxid, Endtxid));        Continue                    } catch (NumberFormatException nfe) {log.error ("edits file" + F + "has improperly formatted" +          "Transaction ID"); Skip}}//check for in-progress edits//detect edit Log In process//Regular expression is Edits_inprogress_ (\d+ )//Original edits_iThe first of the two \ in front of D in Nprogress_ (\\d+) is just the escape character Matcher Inprogresseditsmatch = Edits_inprogress_regex.matcher (name); if (Inprogresseditsmatch.matches ()) {try {long Starttxid = Long.parselong (Inprogresseditsmatch.group (1))          ;          Ret.add (New Editlogfile (F, Starttxid, Hdfsconstants.invalid_txid, true));        Continue                    } catch (NumberFormatException nfe) {log.error ("in-progress edits file" + F + "has improperly" +          "Formatted transaction ID"); Skip}} if (forpurging) {//Check for in-progress stale edits Matcher Staleinprogresse        Ditsmatch = Edits_inprogress_stale_regex. Matcher (name); if (Staleinprogresseditsmatch.matches ()) {try {long Starttxid = long.valueof (staleinprogresseditsmat            Ch.group (1));            Ret.add (New Editlogfile (F, Starttxid, Hdfsconstants.invalid_txid, true));          Continue } catch (NUmberformatexception nfe) {log.error ("in-progress stale edits file" + F + "has improperly" +            "Formatted transaction ID");  Skip}}}} return ret; }
We look at the core of the two parameters of the Matcheditlogs () method, its processing flow is:

1. Create an edit log file editlogfile list ret;

2. Traverse filesinstorage to process each file:

2.1. Get the filename name;

2.2, according to the filename name, using regular expressions, to detect whether it is an edit log edits logs:

The regular expression used is Edits_ (\d+)-(\d+), original Edits_ (\\d+)-(\\d+) in front of D in the first of the two \ is an escape character, the file name resembles the following: Edits_ 0000000000001833048-0000000000001833081, the first string number is the starting transaction ID, the second string is the terminating transaction ID;

2.3. The file name matches the regular expression, indicating that it is the edit log file we need to look for:

2.3.1, get start transaction add Starttxid: The first match in a regular expression is the starting transaction ID;

2.3.2, get termination transaction add Endtxid: The second match in a regular expression is the terminating transaction ID;

2.3.3, use file F, start transaction add Starttxid, terminate transaction add ENDTXID constructs the edit log file Editlogfile instance, and adds to the RET list;

2.3.4, continue, continue to cycle the next file;

2.4, according to the name of the filename, using regular expressions, to detect whether it is in the process of editing the log edits log:

The regular expression used is Edits_inprogress_ (\d+), the first of the two \ in front of D in the original Edits_inprogress_ (\\d+) is just the escape character, and the file name is similar to this one: Edits_inprogress_ 0000000000001853186, the following string is the starting transaction ID;

2.5. The file name matches the regular expression, indicating that it is the editing log file that we need to look for in process:

2.5.1, get start transaction add Starttxid: The first match in a regular expression is the starting transaction ID;

2.5.2, use file F, start transaction add Starttxid, terminate transaction Add-12345 constructs the Editlogfile instance of the edit log file, and adds to the RET list;

2.5.3, continue, continue to cycle the next file;

3. Return to edit log file editlogfile list ret.

And then back to Filejournalmanager's Selectinputstreams () method, let's look at its second step: Call the Addstreamstocollectionfromfiles () method, Add the input stream list streams according to the Edit log file list Elfs, as follows:

  static void Addstreamstocollectionfromfiles (Collection<editlogfile> elfs, COLLECTION&LT;EDITLOGINPUTSTREAM&G T streams, Long Fromtxid, Boolean Inprogressok) {//Traversal Editlogfile collection Elfs, for each editlogfile instance the ELF does the following processing: for (Editlogfile elf          : Elfs) {//If Elf is in process: if (elf.isinprogress ()) {//If you do not need to get an edit log in process, skip the IF (!inprogressok) { Log.debug ("Passing over" + elf + "because it was in progress" + "and we are ignoring in-progress logs."          );        Continue        }//Verify the editing log, the checksum is unsuccessful, skip the try {elf.validatelog () directly;   } catch (IOException e) {log.error ("got IOException while trying to validate header of" + Elf + ".          Skipping. ", e);        Continue }}//If the elf's last transaction, Eddie Lasttxid, is smaller than the start transaction we get the edit log, add Fromtxid, skip directly if (Elf.lasttxid < Fromtxid) {assert elf        . lasttxid! = Hdfsconstants.invalid_txid; Log.debug ("Passing over" + elf + "because it ends at" + elf.Lasttxid + ", but we have care about transactions" + "as new as" + Fromtxid);      Continue }//Use the file in Elf, start transaction add Firsttxid, terminate transaction add Lasttxid, edit log file is in process flag bit isinprogress,//construct edit log file input stream Editlogfil Einputstream instance Elfis Editlogfileinputstream Elfis = new Editlogfileinputstream (Elf.getfile (), Elf.getFirstT      XId (), Elf.getlasttxid (), elf.isinprogress ());            Log.debug ("Selecting Edit Log Stream" + elf);    Add Elfis to the input stream list streams streams.add (Elfis); }  }
Its processing flow is as follows:

Traversing the Editlogfile collection Elfs, for each editlogfile instance elf, handles the following:

1, if the elf is in the process of processing, and if you do not need to get in the process of editing the log, skip directly, otherwise verify the editing log, verification is unsuccessful, skip directly, success will continue;

2, if the last business of Elf Eddie Lasttxid smaller than we get the edit log of the start transaction add Fromtxid, skip directly, otherwise continue;

3, the use of the elf files file, start transaction add Firsttxid, terminate transaction add Lasttxid, edit log file is in the process of the flag bit isinprogress, Constructs an edit log file input stream Editlogfileinputstream instance Elfis;

4. Add Elfis to the input stream list streams.







Editlog Source code Analysis Gets the edit log input stream

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.