Flume Spool Source Source code Process analysis (not running)

Source: Internet
Author: User

The main classes involved are:

Spooldirectorysource reads the user configuration and follows BatchSize to read so many amounts of event from the user-specified spooling dir. Spooldirectorysource does not read a specific file, but reads it through the internal reader. File switching and other operations, areReaderTo implement the inner class: Spooldirectoryrunnable is a thread in which the Run method is completed to read the event from spooling dir (usingReaderto read)
1 @Override2      Public voidrun () {3       intBackoffinterval = 250;4       Try {5          while(!thread.interrupted ()) {6list<event> events = Reader. readevents (batchsize);7           if(Events.isempty ()) {8              Break;9           }Ten Sourcecounter.addtoeventreceivedcount (Events.size ()); One Sourcecounter.incrementappendbatchreceivedcount (); A  -           Try { - getchannelprocessor (). Processeventbatch (events); the reader.commit (); -}Catch(Channelexception ex) { -Logger.warn ("The channel is full, and cannot write data now. The "+ -"Source would try again after" + string.valueof (backoffinterval) + +"Milliseconds"); -Hitchannelexception =true; +             if(backoff) { A TimeUnit.MILLISECONDS.sleep (backoffinterval); atBackoffinterval = Backoffinterval << 1; -Backoffinterval = Backoffinterval >= maxbackoff?Maxbackoff: - Backoffinterval; -             } -             Continue; -           } inBackoffinterval = 250; - Sourcecounter.addtoeventacceptedcount (Events.size ()); to Sourcecounter.incrementappendbatchacceptedcount (); +         } -Logger.info ("Spooling Directory Source Runner has shutdown."); the}Catch(Throwable t) { *Logger.error ("FATAL:" + spooldirectorysource. This. toString () + ":" + $"Uncaught exception in Spooldirectorysource thread." +Panax Notoginseng"Restart or reconfigure Flume to continue processing.", T); -Hasfatalerror =true; the throwables.propagate (t); +       } A}

Reliablespoolingfileeventreader defined in the SpooldirectorysourceReader。 Look at this name will know the bunker, reliable, how to achieve reliable?? Reader's Readevent method, which reads the specified event according to the batchsize size, is roughly the meaning of the method: if there is no commit, if the current file is empty, wrong, otherwise get eventdeserializer if the current file is empty, The next file is obtained, and then the empty event list is returned if the file is still empty. After that, call Eventdeserializer's readevents.
1   PublicList<event> readevents (intnumevents)throwsIOException {2     if(!committed) {3       if(!currentfile.ispresent ()) {4         Throw NewIllegalStateException ("File should not roll when" +5"Commit is outstanding.");6       }7Logger.info ("Last read was never committed-resetting mark position.");8  currentfile. Get (). Getdeserializer (). reset ();9}Else {Ten       //Check If new files has arrived since last call One       if(!currentfile.ispresent ()) { ACurrentfile =GetNextFile (); -       } -       //Return Empty list if no new files the       if(!currentfile.ispresent ()) { -         returncollections.emptylist (); -       } -     } +  -     EventdeserializerDes =currentfile.get (). Getdeserializer (); +list<event> events =des.readevents (numevents); A  at     /*It's possible the last read took us just up to a file boundary. - * If so, try-to-roll-to-the-next file, if there is one.*/ -     if(Events.isempty ()) { - retirecurrentfile (); -Currentfile =GetNextFile (); -       if(!currentfile.ispresent ()) { in         returncollections.emptylist (); -       } toEvents =currentfile.get (). Getdeserializer (). readevents (numevents); +     } -  theWrite header value, slightly -  -Committed =false; theLastfileread =Currentfile; -     returnevents;Wuyi}

In this method, we see the

Currentfile: This object uses Google's optional for encapsulation, making it easier to judge null pointers and so on. Optional<fileinfo> The FileInfo encapsulates the normal file object and the Eventdeserializer (event Sequencer) for the file object

The currentfile is mainly in optional<fileinfo> openFile(file file) in the Reliablespoolingfileeventreader class, Optional Called in the <FileInfo> getnextfile() method.

Eventdeserializer: The main purpose of the event sequencer is to define some basic read operations

Where Mark is to read the line position to mark

Eventdeserializer implementation of subclasses, many, here only Linedeserializer, as the name implies, according to the line to read, a row is an event

Although Eventdeserializer already involves reading the line, it is not him that actually reads the record.

Let's look at the OpenFile function

1String Nextpath =File.getpath ();2Positiontracker Tracker =3  durablepositiontracker. getinstance (MetaFile, nextpath);4       if(!tracker.gettarget (). Equals (Nextpath)) {5 tracker.close ();6 deletemetafile ();7Tracker =durablepositiontracker.getinstance (MetaFile, nextpath);8       } the       Resettableinputstreamin = -           Newresettablefileinputstream (file, tracker, - resettablefileinputstream.default_buf_size, Inputcharset, - decodeerrorpolicy); +Eventdeserializer Deserializer =eventdeserializerfactory.getinstance -(Deserializertype, Deserializercontext,inch);

So it can be seen that Eventdeserializer reads records by Resettablefileinputstream (in object), The initialization of Resettablefileinputstream requires a file class and a durablepositiontracker,

As a result, Resettablefileinputstream reads the file content and uses Durablepositiontracker to log position information.

Durablepositiontracker uses Apache Avro for persistence

Private final datafilewriter<transferstatefilemeta> writer;
Private final datafilereader<transferstatefilemeta> reader;

This way, when we use Eventdeserializer to read an event, we get the information from the current file stream, and we can also record the read location information.

When the event that reads the number of batchsize is correctly processed, Reliablespoolingfileeventreader commits (), persisting the location information

 Public void throws IOException {    if (!committed && currentfile.ispresent ()) {      currentfile.get (). Getdeserializer (). mark ();       true ;    }  }

Here's the Mark method, called

Linedeserializer's

@Override    Public void throws IOException {    ensureopen ();    In.mark ();  }

In the call Resettablefileinputstream (in) of the Mark method

@Override    Public void throws IOException {    tracker.storeposition (Tell ());  }

The Storepostition method (Durablepositiontracker) after which the location tracker is called

@Override    Public synchronized void storeposition (longthrows  IOException {    Metacache.setoffset ( position);    Writer.append (Metacache);    Writer.sync ();    Writer.flush ();  }

After that, the Avro Datafilewriter is called to complete the write operation.

Finally, as for the persistence logic judgment of the postition position, it is basically possible to guess, when the trash occurs, the place that has never been read begins to read, and so on, so to speak, is the Resettablefileinputstream input stream, because he is able to read the information, Can also persist the read information location.

Flume Spool Source Source code Process analysis (not running)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.