The Kafkachannel of Flume-ng

Last Update:2015-03-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Apache Next version (1.6) will bring a new component Kafkachannel, as the name implies is to use Kafka as the channel, of course, in the CDH5.3 version already exists this channel.

As you know, there are three main channel commonly used:

1, Memory channel: With the channel, the advantage is the fastest, easy to configure; The disadvantage is that the reliability is the worst, because once the flume process hangs the memory of the data is not out;

2, File channel: Use local files to do channel, the advantage is the highest reliability, the data are present in the disk file, the process hangs after the restart will continue to resume, the disadvantage is the slowest speed;

3, Spillablememorychannel: The sum of memory channel and file channel, is essentially a file channel, but the priority is stored in memory, memory is full and then overflow to disk, the advantage is to balance the advantages of both ; The same shortcoming is also taken into account;

I know how to use flume of the existing home is not a lot, probably a reasonable topological structure is two layers, the first layer of source directly and the original data source contact, this layer of flume node to more, where channel with file Channel or Spillablememorychannel, high reliability; the second layer is a summary node, where the sink can be directly output such as HDFs, HBase, local disk files and so on, this layer of flume node is much less than the first layer, Channel proposed to use memory channel, because this layer of nodes, to ensure timely aggregation of transmission out, then there is a problem why not spillablememorychannel it? It says, it takes into account two channel, and one of the important disadvantage is that the second layer of node traffic is larger than the first layer, once the second layer of sink problems may cause overflow to the local disk, so sink performance is greatly reduced but the incoming traffic will not be reduced, And Spillablememorychannel in the data is also in order, which may cause the speed has not been able to keep up with the speed, one way is to increase the number of nodes (more why the second layer?) ) or use memory channel. The first layer adds Backoff and uses load balancing to send data to the second tier.

But now the situation greatly improved, the appearance of Kafkachannel so that the above two layers can be synthesized a layer, I have a rough trial, exec source + Kafkachannel + file_roll sink, a broker, a topic, A partition, speed in the 42mb/s around, this speed, although compared to memory channel, but has far more than the file channel, and reliability does not lose the file channel.

We know that Kafka source uses consumer to pull data from Kafka, Kafka Sink uses producer to send data to Kafka. The Kafka channel, which contains producer and Consumer,producer, receives the data sent by the source into the broker, consumer pull data from the broker to sink. Only one topic is allowed at this time, and Kafka's own parameters can be added to the flume configuration file by adding "kafka.*".

Today is simple to say, with the guys hurriedly share this thing, students can seize the moment to try ha ....

The source code is actually not very difficult, but to tell the truth, about Kafka that part still some not too clear, embarrassed in this in-depth analysis ... Take a closer look at the back.

Kafka Everybody learn it, this is also a good thing ...

Reference:

1, http://ingest.tips/2014/11/16/flafka-apache-flume-meets-apache-kafka-for-event-processing/

2, https://github.com/cloudera/flume-ng/tree/cdh5-1.5.0_5.3.2

3, https://github.com/apache/flume/tree/flume-1.6

The Kafkachannel of Flume-ng

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The Kafkachannel of Flume-ng

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The Kafkachannel of Flume-ng

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support