The Kafkachannel of Flume-ng

Source: Internet
Author: User

Apache Next version (1.6) will bring a new component Kafkachannel, as the name implies is to use Kafka as the channel, of course, in the CDH5.3 version already exists this channel.

As you know, there are three main channel commonly used:

1, Memory channel: With the channel, the advantage is the fastest, easy to configure; The disadvantage is that the reliability is the worst, because once the flume process hangs the memory of the data is not out;

2, File channel: Use local files to do channel, the advantage is the highest reliability, the data are present in the disk file, the process hangs after the restart will continue to resume, the disadvantage is the slowest speed;

3, Spillablememorychannel: The sum of memory channel and file channel, is essentially a file channel, but the priority is stored in memory, memory is full and then overflow to disk, the advantage is to balance the advantages of both ; The same shortcoming is also taken into account;

I know how to use flume of the existing home is not a lot, probably a reasonable topological structure is two layers, the first layer of source directly and the original data source contact, this layer of flume node to more, where channel with file Channel or Spillablememorychannel, high reliability; the second layer is a summary node, where the sink can be directly output such as HDFs, HBase, local disk files and so on, this layer of flume node is much less than the first layer, Channel proposed to use memory channel, because this layer of nodes, to ensure timely aggregation of transmission out, then there is a problem why not spillablememorychannel it? It says, it takes into account two channel, and one of the important disadvantage is that the second layer of node traffic is larger than the first layer, once the second layer of sink problems may cause overflow to the local disk, so sink performance is greatly reduced but the incoming traffic will not be reduced, And Spillablememorychannel in the data is also in order, which may cause the speed has not been able to keep up with the speed, one way is to increase the number of nodes (more why the second layer?) ) or use memory channel. The first layer adds Backoff and uses load balancing to send data to the second tier.

But now the situation greatly improved, the appearance of Kafkachannel so that the above two layers can be synthesized a layer, I have a rough trial, exec source + Kafkachannel + file_roll sink, a broker, a topic, A partition, speed in the 42mb/s around, this speed, although compared to memory channel, but has far more than the file channel, and reliability does not lose the file channel.

We know that Kafka source uses consumer to pull data from Kafka, Kafka Sink uses producer to send data to Kafka. The Kafka channel, which contains producer and Consumer,producer, receives the data sent by the source into the broker, consumer pull data from the broker to sink. Only one topic is allowed at this time, and Kafka's own parameters can be added to the flume configuration file by adding "kafka.*".

Today is simple to say, with the guys hurriedly share this thing, students can seize the moment to try ha ....

The source code is actually not very difficult, but to tell the truth, about Kafka that part still some not too clear, embarrassed in this in-depth analysis ... Take a closer look at the back.

Kafka Everybody learn it, this is also a good thing ...

Reference:

1, http://ingest.tips/2014/11/16/flafka-apache-flume-meets-apache-kafka-for-event-processing/

2, https://github.com/cloudera/flume-ng/tree/cdh5-1.5.0_5.3.2

3, https://github.com/apache/flume/tree/flume-1.6

The Kafkachannel of Flume-ng

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.