Installation configuration for Flume

Source: Internet
Author: User

Flume is a distributed, reliable, and highly available system for collecting, aggregating, and transmitting large volumes of logs. Support for customizing various data senders in the log system for data collection, while Flume provides the ability to simply process the data and write to various data recipients (such as text, HDFS, hbase, etc.).

First, what is Flume?
Flume, as a real-time log collection system developed by Cloudera, has been recognized and widely used by the industry. The initial release version of Flume is now collectively known as Flume OG (original Generation), which belongs to Cloudera. But with the expansion of the FLume function, FLume OG code Engineering bloated, the core component design is unreasonable, the core configuration is not standard and other shortcomings exposed, especially in FLume OG final release 0.94.0, log transmission instability is particularly serious, in order to solve these problems, 2011 October 22, Cloudera completed the Flume-728 and made a milestone change to Flume: Refactoring the core components, core configuration, and code architecture, the reconstructed version is collectively known as Flume NG (Next generation), and another reason for the change is Flume Included in Apache, Cloudera Flume renamed Apache Flume.

Features of Flume:
Flume is a distributed, reliable, and highly available system for collecting, aggregating, and transmitting large volumes of logs. Support for customizing various data senders in the log system for data collection, while Flume provides the ability to simply process the data and write to various data recipients (such as text, HDFS, hbase, etc.).
Flume data flows are always run through events. An event is the basic unit of data for Flume, which carries log data (in the form of a byte array) and carries header information that is generated by source outside the agent, which is formatted when the source captures the event, and then the source pushes the event into (single or multiple) The channel. You can think of the channel as a buffer, which will save the event until sink finishes processing the event. Sink is responsible for persisting the log or pushing the event to another source.

Reliability of the Flume
When a node fails, the log can be transmitted to other nodes without loss. Flume provides three levels of reliability assurance, from strong to weak in order: End-to-end (Received data agent first writes the event to disk, when the data transfer is successful, then delete; If the data sent fails, you can resend it.) ), Store On failure (this is also the policy adopted by scribe, when the data receiver crash, writes the data to the local, after the recovery, continues to send), BestEffort (data sent to the receiver, will not be confirmed).

Recoverability of the Flume:
or by the channel. It is recommended to use FileChannel, where events persist in the local file system (poor performance).

Some core concepts of flume:
The agent uses the JVM to run Flume. Each machine runs an agent, but it can contain multiple sources and sinks in one agent.
The client produces data that runs on a separate thread.
Source collects data from the client and passes it to the channel.
Sink collects data from the channel and runs on a separate thread.
The channel connects sources and sinks, which is a bit like a queue.
Events can be log records, Avro objects, and so on.

Flume is the smallest independent operating unit of the agent. An agent is a JVM. A single agent consists of three components of source, sink, and channel, such as:

It is important to note that Flume provides a large number of built-in source, channel, and sink types. Different types of source,channel and sink can be freely combined. The combination is based on user-set profiles and is very flexible. For example, a channel can persist an event in memory, or it can be persisted to a local hard disk. Sink can write logs to HDFs, HBase, or even another source, and so on. Flume support users to establish multi-level flow, that is to say, multiple agents can work together, and support fan-in, fan-out, contextual Routing, Backup Routes, which is the place of NB. As shown in the following:

Second, where is the official website of Flume?
http://flume.apache.org/

Third, where to download?

Http://www.apache.org/dyn/closer.cgi/flume/1.5.0/apache-flume-1.6.0-bin.tar.gz

Iv. How to install?
1) Unzip the downloaded flume package to the/home/hadoop directory and you have completed 50%:) simple

2) Modify the flume-env.sh configuration file, mainly the Java_home variable settings

[Email protected]:/home/hadoop/flume-1.5.0-bin# cp conf/flume-env.sh.template conf/flume-Env.sh[email protected]:/home/hadoop/flume-1.5.0-bin# VI conf/flume-env.sh# Licensed to the Apache software Foundation (ASF) under one# or more contributor license agreements. See the NOTICE file# distributed with ThisWork foradditional information# regarding copyright ownership. The ASF licenses Thisfile# to under the Apache License, Version2.0(the#"License"); If you don't use Thisfile except in compliance# with the License. Obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0# # unless required by applicable law or agreed to writing, software# distributed under the License is distributed O N an"As is"basis,# without warranties or CONDITIONS of any KIND, either express or implied.# see the License forThe specific language governing permissions and# limitations under the License. # If ThisFile is placed at flume_conf_dir/flume-env.sh, it'll be sourced# during Flume startup.  # enviroment variables can is set here. Java_home=/usr/lib/jvm/java-7-Oracle # Give Flume more memory and pre-allocate, enable remote monitoring via jmx#java_opts= "-xms100m-xmx200m-dcom.sun.management.jmxremote"# Note that the Flume conf directory was always included in the classpath. #FLUME_CLASSPATH=""

3) Verify that the installation is successful

[Email protected]:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-1.5.0Source code repository: HTTPS://git-wip-us.apache.org/repos/asf/flume.git7 14:49:18 PDT from Source with checksum A01fe726e4380ba0c9f7a7d222db961f[email protected]:/home/hadoop#

The above message appears, indicating that the installation was successful.

Installation configuration for Flume

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.