first part single node flume configuration
Installation Reference http://flume.apache.org/FlumeUserGuide.html
http://my.oschina.net/leejun2005/blog/288136
Here is a simple introduction, the command to run the agent
$ bin/flume-ng agent-n $agent _name-c conf-f conf/flume-conf.properties.template
1. The single node configuration is as follows
# example.conf:a Single-node Flume Configuration
# Created by Cesar.x 2015/12/14 # Name The components in this
AG ent
a1.sources = R1
a1.sinks = K1
a1.channels = C1
# describe/configure the source
A1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink< C10/>a1.sinks.k1.type = Logger
# Use a channel which buffers events in memory
a1.channels.c1.type = Memory
a1.channels.c1.capacity =
a1.channels.c1.transactionCapacity = $
# Bind The source and sink to the Channe L
a1.sources.r1.channels = C1
A1.sinks.k1.channel = C1
2. Then run the instructions
Bin/flume-ng agent--conf conf--conf-fileconf/myconf/example.conf--name a1-dflume.root.logger=info,console
Ps:-dflume.root.logger=info,console only for debug use, do not production environment mechanically, otherwise a large number of logs will be returned to the terminal ... 3. Then open a shell window again
$ telnet localhost 44444
Trying 127.0.0.1 ...
Connected to Localhost.localdomain (127.0.0.1).
Escape character is ' ^] '.
Hello world! <ENTER>
OK
Question 1
Here we are likely to encounter the problem of not installing Telnet, I here is the Redhat system, if not installed, directly yum–y-install telnet instructions to install.
Question 2
Telnet Connection denied issue
We're looking at 44444-port monitoring.
Netstat-anltup | grep:44444
Reference
Http://www.2cto.com/os/201411/352191.html
Discover that listening is local
Modify the Telnet command
Found that there was no problem with the connection.
Then we enter Hello, carriage return
Go back to that terminal. View
Find information that has been collected from us.
The second part discusses the proxy configuration in more detail
Zookeeper related
We can put the Agent1 configuration file just configured on the ZK. Just after the configuration file is uploaded, we use the following command to start the agent;
Because the official online said, this is an experimental nature, so I will not try.
A schematic of the node in ZK.
-/flume
|-/a1 [Agent config file]
|-/a2 [Agent config file]
Bin/flume-ng agent–conf conf-zzkhost:2181,zkhost1:2181-p/flume–name a1-dflume.root.logger=info,console
nohup Bin/flume-ng Agent--confconf--conf-file conf/myconf/flume_colletc_test.conf-n collectormainagent&
nohup Bin/flume-ng Agent--conf conf--conf-file conf/myconf/flume_colletc_test.conf-n collectormainagent &
Flume to HDFs
Configuration file
# Define A memory channel called CH1 on Agent1 # Created by cesar.x 2015/12/14 agent1.channels.ch1.type = Memory agent1.ch Annels.ch1.capacity = 100000 agent1.channels.ch1.transactionCapacity = 100000 agent1.channels.ch1.keep-alive = # Defi Ne an Avro source called Avro-source1 on Agent1 and the IT # to bind to 0.0.0.0:41414.
Connect it to channel CH1. #agent1. sources.avro-source1.channels = Ch1 #agent1. Sources.avro-source1.type = Avro # Agent1.sources.avro-source1.bind = 0.0.0.0 #agent1. sources.avro-source1.port = 41414 # Agent1.sources.avro-source1.threads = 5 #define Source Monitor a file Agent1.sources.avro-source1.type = exec Agent1.sou Rces.avro-source1.shell =/bin/bash-c Agent1.sources.avro-source1.command = tail-n +0-f/usr/local/hadoop/
Apache-flume-1.6.0-bin/tmp/id.txt Agent1.sources.avro-source1.channels = ch1 Agent1.sources.avro-source1.threads = 5
# Define A logger sink that simply logs all events it receives # and connect it to the other end of the same channel. AGent1.sinks.log-sink1.channel = ch1 Agent1.sinks.log-sink1.type = HDFs Agent1.sinks.log-sink1.hdfs.path = hdfs:// Mycluster/user/flumetest Agent1.sinks.log-sink1.hdfs.writeformat = Text Agent1.sinks.log-sink1.hdfs.filetype =
DataStream agent1.sinks.log-sink1.hdfs.rollinterval = 0 Agent1.sinks.log-sink1.hdfs.rollsize = 1000000
Agent1.sinks.log-sink1.hdfs.rollcount = 0 Agent1.sinks.log-sink1.hdfs.batchsize = 1000
Agent1.sinks.log-sink1.hdfs.txneventmax = Agent1.sinks.log-sink1.hdfs.calltimeout = 60000 Agent1.sinks.log-sink1.hdfs.appendtimeout = 60000 # Finally, now the we ' ve defined all the components, the tell # Agent1
Which ones we want to activate. Agent1.channels = ch1 Agent1.sources = Avro-source1 Agent1.sinks = Log-sink1
Start
Bin/flume-ng agent--conf conf--conf-fileconf/myconf/flume_directhdfs.conf-n Agent1-dflume.root.logger=info, Console
Then write the content to the file in the configuration file simulation
Echo "Test" >> 1.txt
Observing the files in HDFs
View content in a file
Multi-agent to HDFs
Let's take two agents as an example.
Angent are arranged on 172.21.99.124 and 172.21.99.125 and 172.21.99.126 respectively, 172.21.99.134 and 172.21.99.135 are arranged on collect
Here are the configurations for each webserver (that is, 3 agents)
Reference
Https://cwiki.apache.org/confluence/display/FLUME/Getting+Started#GettingStarted-flume-ngavro-clientoptions
Prior to this, we need to modify the flume default memory configuration, open flume-env.sh
Export java_opts= "-xms8192m-xmx8192m-xss256k-xmn2g-xx:+useparnewgc-xx:+useconcmarksweepgc-xx:- Usegcoverheadlimit "
Configuration I strongly recommend to calm down to see the official website instructions, written very clearly
Http://flume.apache.org/FlumeUserGuide.html#setting-multi-agent-flow
Here we are, looking directly at the multi-agent how to configure
# list The sources, sinks and channels for the agent
= <Source1> <Source2>
= <Sink1> <Sink2>
= <Channel1> <Channel2>
Based on the above diagram, we need to configure multiple sinks,
Here are the configuration files we deployed on each appliation agent
Here's something to be aware of.
That is, the type of sources, the official website is also very clear.
Here we test the selection of Spooldir.
But when we actually do the project, may not choose the Spooldir way multi agent encounters the problem log appears expected timestamp in the Flume event headers, but it is nul L
The official website has stated that the event must be preceded by a timestamp unless Timestampinterceptor is set to True
So we add
Flume to Kafka
Reference website
1. Issues that may be encountered
1.1 plus 2 channels Flume won't start.
Specific phenomena as shown below
Later try to suggest 1 agents to Kafka separately.
Configured as follows
Be sure to note the contents of the red box, to use hostname, and to configure the hostname on the Collect node
Otherwise it will be an error
After properly configured, again our Kafka cluster, start a Console-consumer
bin/kafka-console-consumer.sh--zookeeper localhost:2181--from-beginning--topic my-replicated-topic
Specific Command Reference website
Http://kafka.apache.org/documentation.html#quickstart
Then we click on the button on the website, or visit the website, we can realize the
JS to Nginxlog to Flume_agent to Flume_collect to Kafka provider, this whole process
And then display it in the consumer window of the Kafka message middleware.
The above print is as Kafka consumer, we click on the site generated by the access record.