I do not know if you have used Flume's readers to know whether these two concepts are familiar
At first I was a little confused, think this is not repeating ah?
Don't you feel the effect of transactioncapacity?
What the hell is batchsize doing?
......
......
With these questions, we take a closer look at the source:
batchsize
BatchSize This concept first where does it appear?
the process method of Kafkasink
HDFS Sink
Exec Source
With these three pictures above, I believe you should know where batchsize from.
BatchSize is a concept proposed for source and sink that restricts the batch processing of source and sink to event.
Once you can handle batchsize event, this one-time refers to a transaction.
When you have processed more than BatchSize, the transaction is committed.
Note that there is a vague place, that is batchsize must not be greater than transactioncapacity
now, let's talk about transactioncapacity.
First, from this diagram we can see the origin of the concept of transactioncapacity, which comes from the channel, unlike BatchSize (Source,sink).
So, how is the transaction capacity used in the channel??
There is an inner class in the memory channel Memorytransaction
Private class Memorytransaction extends Basictransactionsemantics { private linkedblockingdeque<event> Takelist; Private linkedblockingdeque<event> putlist; Private final channelcounter Channelcounter; private int putbytecounter = 0; private int takebytecounter = 0; Public memorytransaction (int transcapacity, Channelcounter counter) { putlist = new Linkedblockingdeque<event > (transcapacity); Takelist = new linkedblockingdeque<event> (transcapacity); Channelcounter = counter; }
The transaction capacity is used here, which is the capacity of putlist and takelist.
Putlist is used to store the put of the event channel brought by the put operation.
if (!putlist.offer (event)) { throw new Channelexception ( "Put queue for memorytransaction of capacity" + put List.size () + "full, consider committing more frequently," + "increasing capacity or increasing thread count"); }
Before each put, will be pre-judgment put whether the success, from the exception of the hint information can be seen, put is not successful that the transaction capacity is full
The event stored by the takelist is consumed by the take operation and returns the take of an event channel received
if (takelist.remainingcapacity () = = 0) { throw new channelexception ("Take list for memorytransaction, capacity" +
takelist.size () + "full, consider committing more frequently," + "increasing capacity, or increasing thread count") ; }
Take before also pre-judgment, if the takelist is full, indicating take operation is too slow, there is an event accumulation phenomenon, you should adjust the transaction capacity
what happens when a transaction commits, and what does the transaction commit??
Commit is a transaction commit
Two cases:
1, put the event submission
while (!putlist.isempty ()) { if (!queue.offer (Putlist.removefirst ())) { throw new RuntimeException ("Queue add Failed, this shouldn ' t is able to happen "); }
The event is all placed in the queue, and queue is the real flume of the event, its capacity is capacity, a picture can be.
2, take the event submission
Because the event was removed from the queue while taking, the event that was taken out of the queue was submitted by the put.
Finally, see how the transaction is rolled back??
Transaction rollback for take operation, you took the event out, the result processing failed, of course, it must be thrown back, waiting for the next processing!!
Because it entered the rollback operation, the commit operation has an exception, that is, the commit operation failed, and the Putlist and takelist two queues of course are not emptied
while (!takelist.isempty ()) { Queue.addfirst (Takelist.removelast ()); }
The loop adds the event back to the queue.
I do not know this, are we more aware of this?
The concrete analysis and Flume of the concept of transactioncapacity and batchsize in "Flume"