When learning new computer knowledge, the first thing is to write a "Hello World", similarly, in Flume, its "Hello World" is run it.
1, Flume basic outline(1) What does Flume do? Flume is an open source project for Apach that collects data and aggregates data from different nodes into a central node.
(2) will data be lost during flume internal transmission? Flume Internal support transactions, there is no loss of data during data transmission, but it is possible to repeat.
(3) Flume downtime, will the data be lost? The flume internally supports two mechanisms: Memory queue and file queue. Memory queues provide efficient, high-throughput data collection, but when Flume is down, the data stored in the memory queue is lost and cannot be recovered. File queues provide low-performance, high-reliability data collection that can be recovered when flume is down and stored in a file queue.
(4) The robustness of Flume, how stable?
Flume Service can achieve 99.99% of the basic
2. Flume Basic components (source, channel, sink)According to Figure 1.1, a brief introduction to the internal system architecture of flume, data flow, basic components.
The flume consists of three components to support the entire internal system architecture, and three components are Source, Channel, Sink, respectively. The channel is the data memory, which holds all the data inside the flume; Source is similar to the producer, it accepts external data and saves the data to channel;sink similar to the consumer, pulls the data out of the Channel and sends it to the outside. So the data flow inside the flume is the Web server-> source-> channel-> Sink , HDFS. Use an example we are familiar with to describe the relationship between source, channel, and Sink: source is inlet pipe, channel is cistern, Sink is outlet pipe.
3. Start Flume(1) Go to the official website to download the flume package. Official website:
http://flume.apache.org/download.html
Flume Package:
apache-flume-1.5.2-bin.tar.gz
(2) unzip the flume package. TAR-ZXVF apache-flume-1.5.2-bin.tar.gz
CD Apache-flume-1.5.2-bin
Where the lib file is stored in the jar package, the Conf file is a configuration file, the bin file is stored in the execution script.
(3) Creating a configuration fileThe configuration files stored in the Conf file are: Flume-conf.properties.template,flume-env.sh.template, log4j.properties.
Flume-conf.properties.template The template is used to configure the properties of the Source, Channel, Sink.
Flume-env.sh.template The template is used to configure the execution environment. [No introduction]
As already known in section 2, flume internally consists of 3 component Source, Channel, Sink connected to form the data stream, so we need to use the configuration file to initialize the properties of each component, and how they connect to each other.
Create the flume.conf file in the Conf directory with the following file contents.
# example.conf:a Single-node Flume configuration# Name The components in this agenta1.sources = R1a1.sinks = K1a1.channel s = c1# describe/configure the Sourcea1.sources.r1.type = Netcata1.sources.r1.bind = Localhosta1.sources.r1.port = 44444# Describe the Sinka1.sinks.k1.type = logger# use a channel which buffers events in Memorya1.channels.c1.type = memorya1.ch annels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Bind The source and sink to the channela1.sources.r1.ch Annels = C1a1.sinks.k1.channel = C1
(4) Start FlumeThe script that starts Flume is Bin/flume-ng, which executes it, namely: Bin/flume-ng, which displays the command's help command.
i.e.: Usage:bin/flume-ng <command> [options] ...
here, we focus on the following parameters:
commands: agent launches a flume agent
Global options:--conf,-c <conf> Specify the profile directory, which refers to flume.env,log4j and flume.conf under the conf/directory.
-dproperty = value Sets a java system parameter
Agent options:--conf-file,-file<file> set up a configuration file, that is, flume.conf
--name,-n<name> flume agent name
That is Bin/flume-ng agent--conf conf--conf-file conf/flume.conf--name A1-dflume.root.logger=info,console
When this is done, you will find the log flume printed.
(5) Send datafrom (3) you can see that the type of the configuration file Source R1 is netcat and the listening port is 44444. so we can manually perform a telnet localhost 44444来 connection to the Source R1. and send data to it, "Hello world", as shown, enter, finish sending data.
The flume print logs are:
we see Flume's Sink K1 the "Hello World" output to the screen. So far we have finished Flume's "Hello World".
Reference Documents:http://flume.apache.org/FlumeUserGuide.html
Chapter One start flume