The main classes involved are:
Spooldirectorysource reads the user configuration and follows BatchSize to read so many amounts of event from the user-specified spooling dir. Spooldirectorysource does not read a specific file, but reads it through the internal reader. File switching and other operations, areReaderTo implement the inner class: Spooldirectoryrunnable is a thread in which the Run method is completed to read the event from spooling dir (usingReaderto read)
1 @Override2 Public voidrun () {3 intBackoffinterval = 250;4 Try {5 while(!thread.interrupted ()) {6list<event> events = Reader. readevents (batchsize);7 if(Events.isempty ()) {8 Break;9 }Ten Sourcecounter.addtoeventreceivedcount (Events.size ()); One Sourcecounter.incrementappendbatchreceivedcount (); A - Try { - getchannelprocessor (). Processeventbatch (events); the reader.commit (); -}Catch(Channelexception ex) { -Logger.warn ("The channel is full, and cannot write data now. The "+ -"Source would try again after" + string.valueof (backoffinterval) + +"Milliseconds"); -Hitchannelexception =true; + if(backoff) { A TimeUnit.MILLISECONDS.sleep (backoffinterval); atBackoffinterval = Backoffinterval << 1; -Backoffinterval = Backoffinterval >= maxbackoff?Maxbackoff: - Backoffinterval; - } - Continue; - } inBackoffinterval = 250; - Sourcecounter.addtoeventacceptedcount (Events.size ()); to Sourcecounter.incrementappendbatchacceptedcount (); + } -Logger.info ("Spooling Directory Source Runner has shutdown."); the}Catch(Throwable t) { *Logger.error ("FATAL:" + spooldirectorysource. This. toString () + ":" + $"Uncaught exception in Spooldirectorysource thread." +Panax Notoginseng"Restart or reconfigure Flume to continue processing.", T); -Hasfatalerror =true; the throwables.propagate (t); + } A}
Reliablespoolingfileeventreader defined in the SpooldirectorysourceReader。 Look at this name will know the bunker, reliable, how to achieve reliable?? Reader's Readevent method, which reads the specified event according to the batchsize size, is roughly the meaning of the method: if there is no commit, if the current file is empty, wrong, otherwise get eventdeserializer if the current file is empty, The next file is obtained, and then the empty event list is returned if the file is still empty. After that, call Eventdeserializer's readevents.
1 PublicList<event> readevents (intnumevents)throwsIOException {2 if(!committed) {3 if(!currentfile.ispresent ()) {4 Throw NewIllegalStateException ("File should not roll when" +5"Commit is outstanding.");6 }7Logger.info ("Last read was never committed-resetting mark position.");8 currentfile. Get (). Getdeserializer (). reset ();9}Else {Ten //Check If new files has arrived since last call One if(!currentfile.ispresent ()) { ACurrentfile =GetNextFile (); - } - //Return Empty list if no new files the if(!currentfile.ispresent ()) { - returncollections.emptylist (); - } - } + - EventdeserializerDes =currentfile.get (). Getdeserializer (); +list<event> events =des.readevents (numevents); A at /*It's possible the last read took us just up to a file boundary. - * If so, try-to-roll-to-the-next file, if there is one.*/ - if(Events.isempty ()) { - retirecurrentfile (); -Currentfile =GetNextFile (); - if(!currentfile.ispresent ()) { in returncollections.emptylist (); - } toEvents =currentfile.get (). Getdeserializer (). readevents (numevents); + } - theWrite header value, slightly - -Committed =false; theLastfileread =Currentfile; - returnevents;Wuyi}
In this method, we see the
Currentfile: This object uses Google's optional for encapsulation, making it easier to judge null pointers and so on. Optional<fileinfo> The FileInfo encapsulates the normal file object and the Eventdeserializer (event Sequencer) for the file object
The currentfile is mainly in optional<fileinfo> openFile(file file) in the Reliablespoolingfileeventreader class, Optional Called in the <FileInfo> getnextfile() method.
Eventdeserializer: The main purpose of the event sequencer is to define some basic read operations
Where Mark is to read the line position to mark
Eventdeserializer implementation of subclasses, many, here only Linedeserializer, as the name implies, according to the line to read, a row is an event
Although Eventdeserializer already involves reading the line, it is not him that actually reads the record.
Let's look at the OpenFile function
1String Nextpath =File.getpath ();2Positiontracker Tracker =3 durablepositiontracker. getinstance (MetaFile, nextpath);4 if(!tracker.gettarget (). Equals (Nextpath)) {5 tracker.close ();6 deletemetafile ();7Tracker =durablepositiontracker.getinstance (MetaFile, nextpath);8 } the Resettableinputstreamin = - Newresettablefileinputstream (file, tracker, - resettablefileinputstream.default_buf_size, Inputcharset, - decodeerrorpolicy); +Eventdeserializer Deserializer =eventdeserializerfactory.getinstance -(Deserializertype, Deserializercontext,inch);
So it can be seen that Eventdeserializer reads records by Resettablefileinputstream (in object), The initialization of Resettablefileinputstream requires a file class and a durablepositiontracker,
As a result, Resettablefileinputstream reads the file content and uses Durablepositiontracker to log position information.
Durablepositiontracker uses Apache Avro for persistence
Private final datafilewriter<transferstatefilemeta> writer;
Private final datafilereader<transferstatefilemeta> reader;
This way, when we use Eventdeserializer to read an event, we get the information from the current file stream, and we can also record the read location information.
When the event that reads the number of batchsize is correctly processed, Reliablespoolingfileeventreader commits (), persisting the location information
Public void throws IOException { if (!committed && currentfile.ispresent ()) { currentfile.get (). Getdeserializer (). mark (); true ; } }
Here's the Mark method, called
Linedeserializer's
@Override Public void throws IOException { ensureopen (); In.mark (); }
In the call Resettablefileinputstream (in) of the Mark method
@Override Public void throws IOException { tracker.storeposition (Tell ()); }
The Storepostition method (Durablepositiontracker) after which the location tracker is called
@Override Public synchronized void storeposition (longthrows IOException { Metacache.setoffset ( position); Writer.append (Metacache); Writer.sync (); Writer.flush (); }
After that, the Avro Datafilewriter is called to complete the write operation.
Finally, as for the persistence logic judgment of the postition position, it is basically possible to guess, when the trash occurs, the place that has never been read begins to read, and so on, so to speak, is the Resettablefileinputstream input stream, because he is able to read the information, Can also persist the read information location.
Flume Spool Source Source code Process analysis (not running)