Overview
Apache Flume is a distributed, reliable, and available system. Ability to efficiently collect, summarize and move large amounts of log data from many different sources, one centralized data store.
The use of Apache's flume is not limited to log data aggregation. Since the data source is customizable, flume can be used for a large number of events (each row of data is treated as an event) data including but not limited to
In the transmission of network data, social media generates data, email and virtually any data source possible.
Apache Flume is the top project of the Apache Software Foundation and currently has two versions of the code, versions 0.9.x and 1.x. 1.x is a new architecture that has been re-improved in performance and configuration
The flexibility to encourage users to use.
System Requirements
1, Java:java 1.6 or later (Java 1.7 is recommended);
2, Memory: Configure the use of sources, channels, sinks need to have enough memory;
3, disk space: Configure the use of channels, sinks need to have enough disk space;
4, directory permissions: The use of the agent needs to have the directory read and write permissions;
Data flow model
The flume event is defined as a valid byte in the data flow and an optional string property setting. Flume is the (JVM) process in which a host part passes an event stream from an external source to the next destination (hop).
The flume source consumes an external source like a Web server passed to the event. An event emitted by an external source flumeflume the source in a format identified by the target. For example, a company's flume source can be used to send events from a corporate customer or other flume agent to receive Avro events from a company's catchment. A similar process can be defined using frugal flume sources to receive events from sinks or Flume thrifty RPC clients or frugal writing in any language generated from Flume's frugal agreement. When the flume source receives an event, it is saved as one or more channels. A channel is a passive store that causes an event to be consumed until it flumeflume. The file channel is an example – supported by the local file system. Flume removes the event from the channel and turns it into an external repository like HDFs (via the Flumehdfs library) or forwards to the next Flume,flume source agent (Next hop) in the flow. The agent on the given source and sink runs asynchronously on the channel staged event.
Complex flow
Flume allows the user to establish multiple agents in the event stream. It allows to support fan-in and fan-out, context Routing and backup routing, skipping failures when reaching the destination.
Data Acquisition Tool Flume