Hdfssink components are mainly composed of several classes of Hdfseventsink,bucketwriter,hdfswriter.
One of the main functions of hdfseventsink is to determine whether the sink configuration conditions are legitimate, and is responsible for obtaining events from the channel, by parsing event header information to determine the event corresponding to the Bucketwriter.
Bucketwriter is responsible for generating (roll) files on the HDFs side in accordance with rollcount,rollsize and other conditions, using the configuration file configuration file data format as well as the serialized way, in each bucetwriter the same processing.
Hdfswriter as an interface, its specific implementation has hdfssequencefile,hdfsdatastream,hdfscompresseddatastream these three kinds of
Key class diagram in Hdfssink function
Hdfseventsink class
Before going through Hdfseventsink, be sure to have an understanding of the configuration parameters (Flume-hdfssink configuration parameter description)
1, configure () method, from the configuration file to obtain information such as Filepath,filename, the specific parameter meaning can be referenced (flume-hdfssink configuration parameter description)
2, Start () method, initialize fixed size thread pool Calltimeoutpool, cycle execution thread pool Timedrollerpool, and Sfwriters, and start Sinkcounter
Calltimeoutpool
Timedrollerpool, a thread in the cycle execution thread pool that has the main HDFs file rename (according to RetryInterval), the thread that generated the file requires roll operation (according to IdleTimeout), Close the idle files thread, etc. (Rollinterval)
Sfwriters Sfwriters is actually a Linkedhashmap implementation class that removes the oldest unused writer by overriding the Removeeldestentry method, Maximum number of open files guaranteed to maintain a fixed size (maxopenfiles) in Sfwriters
Sinkcounter counter for sink component monitoring indicator
3. The process () method is the most important logic in Hdfseventsink (some key nodes are written into the code by comments),
In the process () method, the channel is obtained,
And in accordance with the batchsize size loop from the channel to obtain the event, by parsing the event to get the event header and other information, determine the event's HDFs destination path and destination file name
Each event may correspond to different bucketwriter and Hdfswriter, adding each event to the corresponding writer
When the number of event reaches the number of batchsize, writer Flush, and commit the transaction
Where Bucketwriter is responsible for generating (roll) files, processing the file format, and serialization of logic
Among them, Hdfswriter concrete realization has "sequencefile", "DataStream", "Compressedstream", three kinds, the user determines the specific according to the hdfs.filetype parameter The realization of Hdfswriter
Public status process () throws eventdeliveryexception {channel channel = getchannel (); //calls the parent class Getchannel method to establish a connection between the channel and the sink transaction transaction = Channel.gettransaction ();//Each batch submission is built on a transaction transaction.begin ();try {set<bucketwriter> Writers = new linkedhashset<> ();int txneventcount = 0;for ( txneventcount = 0; txneventcount < batchsize; txneventcount++) {Event event = channel.take ();//Remove eventif (Event == null) from the channel {// When there is no new event, you do not need to follow the batchsize loop to take the break;} reconstruct the path name by substituting place holders// In the configuration file there will be a variable of "a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%h%m/%s" such as% representation// The variables in the parsing configuration file construct realpath and realnamestring realpath = bucketpath.escapestring (FilePath, event.getheaders (), TIMEZONE,&NBSp;needrounding, roundunit, roundvalue, uselocaltime); String realname = bucketpath.escapestring (Filename, event.getheaders (),timeZone, Needrounding, roundunit, roundvalue, uselocaltime); string lookuppath = realpath + directory_delimiter + realname; bucketwriter bucketwriter; hdfswriter hdfswriter = null; Writercallback closecallback = new writercallback () {@Overridepublic void Run (String bucketpath) {log.info ("writer callback called."); synchronized (Sfwriterslock) {sfwriters.remove (Bucketpath);// Sfwriters maintains a maxopenfiles-sized map in an LRU manner. Always keep the maximum number of open files}}};synchronized (sfwriterslock) {bucketwriter = sfwriters.get (Lookuppath);// we haven ' T seen this file yet, so open it and cache the handleif (bucketwriter == null) {hdfsWriter = writerfactory.getwriter (FileType);//The file type is obtained through the factory, which includes "Sequencefile", "DataStream", " Compressedstream "; Bucketwriter = initializebucketwriter (realpath, realname,lookuppath, Hdfswriter, closecallback); Sfwriters.put (Lookuppath, bucketwriter);}} write the data to hdfstry {bucketwriter.append (event);} catch (Bucketclosedexception ex) {log.info ("bucket was closed while trying to append, " +" reinitializing bucket and writing event. "); Hdfswriter = writerfactory.getwriter (FileType); Bucketwriter = initializebucketwriter ( Realpath, realname,lookuppath, hdfswriter, closecallback);synchronized (SfWritersLock) {sfwriters.put (Lookuppath, bucketwriter);} Bucketwriter.append (event);} track the buckets getting written in this transactionif (! Writers.containS (bucketwriter)) {writers.add (Bucketwriter);}} if (txneventcount == 0) {sinkcounter.incrementbatchemptycount ();} else if (txneventcount == batchsize) {sinkcounter.incrementbatchcompletecount () ;} else {sinkcounter.incrementbatchunderflowcount ();} flush all pending buckets before committing the transactionfor (bucketwriter bucketwriter : writers) {bucketwriter.flush ();} Transaction.commit ();if (txneventcount < 1) {return status.backoff;} else {sinkcounter.addtoeventdrainsuccesscount (txneventcount); return status.ready;}} catch (Ioexception eio) {transaction.rollback (); Log.warn ("Hdfs io error", eio); return status.backoff;} catch (throwable th) {transaction.rollback (); Log.error ("process failed", th);if (th instanceof error) &NBSP;{THROW&NBSP; (Error) th;} else {throw new eventdeliveryexception (th);}} finally {transaction.close ();}}
Bucketwriter
Flush () Method:
Bucketwriter maintenance of a batchcounter, in this batchcounter size is not 0 will be Doflush (), Doflush () is mainly the event in batch serialization and output stream flush operation, The end result is to write events into HDFs.
If the user sets the IdleTimeout parameter to 0, after the Doflush () operation, a task is added to the timed execution thread pool, which closes the output object Hdfswriter of the current connection HDFs, the execution interval is idletimeout, and assign the deferred task to the Idlefuture variable.
Append () Method:
In the introduction of the Flush () method, a function that corresponds to a idlefuture variable is introduced, and the Append () method is first checked to see if the Idlefuture task is completed and a timeout is set if no execution is completed calltimeout waiting for the process to complete , and then perform the operation after append. This was done primarily to prevent the append of data from being closed during hdfswriter, and Hdfswriter was closed at half the append.
After that, before it is append (), you must first check if the current presence Hdfswirter is available for the append operation if the open () method is not called.
Each time the event is append in HDFs, the rollcount,rollsize two parameters need to be checked, in the case of the two parameter conditions, you need to rename the temporary file to (roll) the official HDFs file. After that, re-open a hdfswriter, to this hdfswriter append each event, when the number of event reached BatchSize, the flush operation.
Public synchronized void append (final event event) throws ioexception, interruptedexception {checkandthrowinterruptedexception ();// idlefuture is a scheduledfuture instance, The main function is to close the current hdfswriter, before append event need to determine whether the// idlefuture has been performed, otherwise it will cause hdfswriter to be closed at half of append if (Idlefuture != null) {idlefuture.cancel (false);// there is still a small race condition - if the idleFuture is already// Running, interrupting it can cause hdfs close operation to throw -// so we cannot interrupt it while running. If the future could not be// cancelled, it is already running - wait for it to finish before// attempting to write.if (! Idlefuture.isdone ()) {try {idlefutUre.get (calltimeout, timeunit.milliseconds);} catch (Timeoutexception ex) {log.warn ("Timeout while trying to cancel closing of idle file. idle " +" file close may have Failed ", ex);} catch (Exception ex) {log.warn ("error while trying to cancel closing of idle file. ", ex);}} Idlefuture = null;} If the bucket writer was closed due to roll timeout Or idle timeout,// force a new bucket writer to be created. Roll count and roll size will// just reuse this oneif (!isopen) {if (closed) {throw new bucketclosedexception ("This bucket writer was closed and " +" This handle is thus&nbsP;no longer valid ");} Open ();} Check the parameters of Rollcount,rollsize two roll files to determine whether to roll out new files if (Shouldrotate ()) {boolean doRotate = true;if (isunderreplicated) {if (maxconsecunderreplrotations > 0 & &consecutiveunderreplrotatecount >= maxconsecunderreplrotations) {doRotate = false;if (consecutiveunderreplrotatecount == maxconsecunderreplrotations) {LOG.error ("Hit max consecutive under-replication rotations ({}); " +" will not continue rolling files under this path due to " +" Under-replication ", maxconsecunderreplrotations);}} else {log.warn ("Block under-replication detected. rotating file."); consecutiveunderreplrotatecount++;} else {consecutiveunderreplrotatecount = 0;} if (dorotate) {close (); open ();}} write the&Nbsp;eventtry {sinkcounter.incrementeventdrainattemptcount ();// sinkcounter Statistics Metrixcallwithtimeout ( New callrunner<void> () {@Overridepublic void call () throws Exception {Writer.append (event); //writer is Hdfswriter implementation Return null created by configuration parameters hdfs.filetype;});} catch (ioexception e) {log.warn ("caught ioexception writing to hdfswriter ({}) . closing file (" +bucketPath + ") and rethrowing exception. ", E.getmessage ()); Try {close (true);} catch (ioexception e2) {log.warn ("caught ioexception while closing file (" +bucketPath + "). exception follows. ", e2);} Throw e;} update statisticsprocesssize += event.getbody (). length;eventcounter++;batchcounter++;if (batchcounter == batchsize) {flush ();}}
"Flume" Hdfssink Source understanding