Http://boylook.itpub.net/post/43144/531408
The main processing process of HDFS sink is in the process method:
// Loop batchSize times or Channel is empty
For(TxnEventCount = 0; txnEventCount <batchSize; txnEventCount ++ ){
// This method calls the specific implementation of BasicTransactionSemantics.
Event event = channel. take ();
If(Event =Null){
Break;
}
......
// SfWriter is an LRU cache that caches Handler files. The maximum number of opened files is controlled by the maxopenfiles parameter.
BucketWriter bucketWriter = sfWriters. get (lookupPath );
// Construct a cache if it does not exist
If(BucketWriter =Null){
// Generate an hdfswriter Based on filetype through HDFSWriterFactory, which is controlled by the hdfs. Filetype parameter; eg: HDFSDataStream
HDFSWriter hdfsWriter = writerFactory. getWriter (fileType );
// IdleCallback will be deleted from LRU after bucketWriter flush is finished;
BucketWriter =NewBucketWriter (rollInterval, rollSize, rollCount,
BatchSize, context, realPath, realName, inUsePrefix, inUseSuffix,
Suffix, codeC, compType, hdfsWriter, timedRollerPool,
ProxyTicket, sinkCounter, idleTimeout, idleCallback,
LookupPath, callTimeout, callTimeoutPool );
SfWriters. put (lookupPath, bucketWriter );
}
......
// Track the bucket in a transaction
If(! Writers. contains (bucketWriter )){
Writers. add (bucketWriter );
}
// Write data to HDFS;
BucketWriter. append (event);->
Open (); // If append is supported at the underlying layer, it is opened through the open interface; otherwise, the create Interface
// Determine whether to switch logs
// Compare the number of copied copies with the number of target copies. If not, doRotate = false
If (doRotate ){
Close ();
Open ();
}
HDFSWriter. append (event );
If (batchCounter = batchSize) {// if the batchSize is reached, perform a flush operation.
Flush ();->
DoFlush ()->
HDFSWriter. sync ()->
FSDataoutputStream. flush/sync
}
// Refresh all buckets before committing transactions
For(BucketWriter bucketWriter: writers ){
BucketWriter. flush ();
}
Transaction. commit ();
In this example, operations such as append, sync, and rename executed by BucketWriter are submitted to a backend thread pool for asynchronous processing: callWithTimeout. The thread pool size is set by hdfs. threadsize;
This article is from "MIKE's old blog" blog, please be sure to keep this source http://boylook.blog.51cto.com/7934327/1298627