The concrete analysis and Flume of the concept of transactioncapacity and batchsize in "Flume"

Source: Internet
Author: User

I do not know if you have used Flume's readers to know whether these two concepts are familiar

At first I was a little confused, think this is not repeating ah?

Don't you feel the effect of transactioncapacity?

What the hell is batchsize doing?

......

......

With these questions, we take a closer look at the source:

batchsize

BatchSize This concept first where does it appear?

the process method of Kafkasink


HDFS Sink


Exec Source



With these three pictures above, I believe you should know where batchsize from.

BatchSize is a concept proposed for source and sink that restricts the batch processing of source and sink to event.

Once you can handle batchsize event, this one-time refers to a transaction.

When you have processed more than BatchSize, the transaction is committed.


Note that there is a vague place, that is batchsize must not be greater than transactioncapacity


now, let's talk about transactioncapacity.


First, from this diagram we can see the origin of the concept of transactioncapacity, which comes from the channel, unlike BatchSize (Source,sink).

So, how is the transaction capacity used in the channel??

There is an inner class in the memory channel Memorytransaction

Private class Memorytransaction extends Basictransactionsemantics {    private linkedblockingdeque<event> Takelist;    Private linkedblockingdeque<event> putlist;    Private final channelcounter Channelcounter;    private int putbytecounter = 0;    private int takebytecounter = 0;    Public memorytransaction (int transcapacity, Channelcounter counter) {      putlist = new Linkedblockingdeque<event > (transcapacity);      Takelist = new linkedblockingdeque<event> (transcapacity);      Channelcounter = counter;    }
The transaction capacity is used here, which is the capacity of putlist and takelist.

Putlist is used to store the put of the event channel brought by the put operation.

if (!putlist.offer (event)) {        throw new Channelexception (          "Put queue for memorytransaction of capacity" +            put List.size () + "full, consider committing more frequently," +            "increasing capacity or increasing thread count");      }
Before each put, will be pre-judgment put whether the success, from the exception of the hint information can be seen, put is not successful that the transaction capacity is full

The event stored by the takelist is consumed by the take operation and returns the take of an event channel received

if (takelist.remainingcapacity () = = 0) {        throw new channelexception ("Take list for memorytransaction, capacity" +
   
    takelist.size () + "full, consider committing more frequently," +            "increasing capacity, or increasing thread count") ;      }
   
Take before also pre-judgment, if the takelist is full, indicating take operation is too slow, there is an event accumulation phenomenon, you should adjust the transaction capacity

what happens when a transaction commits, and what does the transaction commit??

Commit is a transaction commit

Two cases:

1, put the event submission

while (!putlist.isempty ()) {            if (!queue.offer (Putlist.removefirst ())) {              throw new RuntimeException ("Queue add Failed, this shouldn ' t is able to happen ");            }
The event is all placed in the queue, and queue is the real flume of the event, its capacity is capacity, a picture can be.

2, take the event submission


Because the event was removed from the queue while taking, the event that was taken out of the queue was submitted by the put.

Finally, see how the transaction is rolled back??


Transaction rollback for take operation, you took the event out, the result processing failed, of course, it must be thrown back, waiting for the next processing!!

Because it entered the rollback operation, the commit operation has an exception, that is, the commit operation failed, and the Putlist and takelist two queues of course are not emptied

while (!takelist.isempty ()) {          Queue.addfirst (Takelist.removelast ());        }
The loop adds the event back to the queue.

I do not know this, are we more aware of this?


The concrete analysis and Flume of the concept of transactioncapacity and batchsize in "Flume"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.