Data Acquisition Tool Flume

Source: Internet
Author: User

Overview
Apache Flume is a distributed, reliable, and available system. Ability to efficiently collect, summarize and move large amounts of log data from many different sources, one centralized data store.
The use of Apache's flume is not limited to log data aggregation. Since the data source is customizable, flume can be used for a large number of events (each row of data is treated as an event) data including but not limited to
In the transmission of network data, social media generates data, email and virtually any data source possible.
Apache Flume is the top project of the Apache Software Foundation and currently has two versions of the code, versions 0.9.x and 1.x. 1.x is a new architecture that has been re-improved in performance and configuration
The flexibility to encourage users to use.

System Requirements
1, Java:java 1.6 or later (Java 1.7 is recommended);
2, Memory: Configure the use of sources, channels, sinks need to have enough memory;
3, disk space: Configure the use of channels, sinks need to have enough disk space;
4, directory permissions: The use of the agent needs to have the directory read and write permissions;

Data flow model
The flume event is defined as a valid byte in the data flow and an optional string property setting. Flume is the (JVM) process in which a host part passes an event stream from an external source to the next destination (hop).

The flume source consumes an external source like a Web server passed to the event. An event emitted by an external source flumeflume the source in a format identified by the target. For example, a company's flume source can be used to send events from a corporate customer or other flume agent to receive Avro events from a company's catchment. A similar process can be defined using frugal flume sources to receive events from sinks or Flume thrifty RPC clients or frugal writing in any language generated from Flume's frugal agreement. When the flume source receives an event, it is saved as one or more channels. A channel is a passive store that causes an event to be consumed until it flumeflume. The file channel is an example – supported by the local file system. Flume removes the event from the channel and turns it into an external repository like HDFs (via the Flumehdfs library) or forwards to the next Flume,flume source agent (Next hop) in the flow. The agent on the given source and sink runs asynchronously on the channel staged event.

Complex flow
Flume allows the user to establish multiple agents in the event stream. It allows to support fan-in and fan-out, context Routing and backup routing, skipping failures when reaching the destination.


Data Acquisition Tool Flume

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.