Comparison of Flume using scene flume with Kafka

Source: Internet
Author: User

Is Flume a good fit for your problem?

If you need to ingest textual log data into Hadoop/hdfs then Flume are the right fit for your problem, full stop. For other use cases, here is some guidelines:

Flume is designed to transport and ingestregularly-generated event data over relatively stable, potentially complex topologies.  The notion of  "Event data"  is very broadly defined. to  flume, an event is just a generic blob of bytes. there  are some limitations on how large an event can be -  for instance, it cannot be larger than what you can  Store in memory or on disk on a single machine - but  in practice, flume events can be everything from textual  Log entries to image files. the key property of an event  is that they are generated in a continuous, streaming  fashion. if your data is not regularly generated  (I.e. you are trying to do a  single bulk load of data into a hadoop cluster)  then  Flume will still work, but it is probably overkill for your  situation. flume likes relatively stable topologies. your topologies  do not need to be immutable, because Flume can deal  with changes in topology without losing data and can also  Tolerate periodic reconfiguration due to fail-over or provisioning. it  probably won ' T work well if you plant to change topologies  every day, because reconfiguration takes some thought and  Overhead.

Above is the Flume official website's explanation, translates as follows:

is flume suitable for your problem?

If you want to extract the textual log data to HDFs, then Flume is a good fit. For other scenarios, there are some things to consider:

Flume are designed to transmit and extract periodically generated data that is transmitted over a relatively stable, possibly complex topology. Each data is an event. The concept of "event data" is very extensive. For Flume, an event is a BLOB byte data. There is a limit to the size of this event, for example, it cannot be larger than the size of memory or hard disk or a single machine can store. In fact, the Flume event can be anything from the log text to the picture file. The key point of the event is continuous generation, flow-type . If your data is not generated on a regular basis (such as importing data to a Hadoop cluster at once ), Flume can work, but it's a bit overkill. Flume prefers a relatively stable topology. Your topology does not have to be immutable, because Flume can handle changes to the topology without losing data, and can tolerate periodic reconfiguration due to failover. But if you change the topology every day, then Flume will not work well, because reconfiguration will incur overhead.

In short, there are two points:

1, data. The data is generated on a regular basis.

2, the network topology is relatively stable.



Kafka, Flume can achieve data transmission, but their focus is different.

Kafka pursuit of high throughput, high load (topic can have multiple partition)

Flume pursues the diversity of data: the diversity of data sources, the diversity of data flows


Use Kafka if the data source is single and you want high throughput

You can use Flume if you have a large source of data and a lot of data flow

Kafka and Flume can also be used together.



From for notes (Wiz)

Comparison of Flume using scene flume with Kafka

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.