First introduce Flume is a god horse East. Flume can obtain data from a variety of data sources and then pass it to a different destination path. It is common to use flume to transfer logs to different places, such as collecting logs files from Web server and then transferring them to Hadoop cluster for analysis. Flume configuration is flexible and simple, can achieve different situations of log shipping, is really a good tool.
OK, let's take a look at how to install configuration flume. You can get the latest binary version of Flume from http://flume.apache.org. After downloading, unzip, and then add the Flume Bin directory to path (the default system is already installed in Java, and the Java Bin directory has been added to the path). Finally enter the Flume directory to find the Conf directory, the inside of the flume-env.sh.template copy into flume-env.sh, and then add Java_home to the inside. The installation is now ready.
Let's explain some important characters in Flume, then we configure several agents to play.
There are four important characters in Flume, namely source,channel,sink,event. Source collects the data and creates an event, then passes the event to the specified channel, and the event arrives through the channel Sink,sink is responsible for passing the event's output to the specified target location. In short, it is the source→channel→sink→destination, which transmits the event.
OK, let's take a look at a simple agent. This agent we use to receive network data and write to flume log. The configuration code is as follows:
Agent1.sources = netsource//Specifies that the source type is Netsourceagent1.sinks = logsink//Specifies that the sink type is logsinkagent1.channels = MEMORYCHANNEL//specifies that the channel type is Memorychannel, which is to read the data into memory and then transfer Agent1.sources.netsource.type = netcat// Netsource type Netcatagent1.sources.netsource.bind = localhost//bind the machine Agent1.sources.netsource.port = 3000// The listener port is native 3000agent1.sinks.logsink.type = Logger//logsink is of type Loggeragent1.channels.memorychannel.type = memory// The specified type is memoryagent1.channels.memorychannel.capacity = 1000// Specify data cache Size agent1.channels.memorychannel.transactionCapacity = 100agent1.sources.netsource.channels = memorychannel// Specifies the Channelagent1.sinks.logsink.channel = memorychannel//specified by the source for the received channel of the sink
Name the above code agent1.conf and keep it in Flume's working directory (I'm lazy, no new, just use the downloaded flume directory after downloading).
Then start the agent!. Enter in Terminal: Flume-ng agent--conf conf--conf-file agent1.conf--name agent1 (--conf Specify conf directory,--conf-file specify profile name,-- Name specifies the agent name).
Then open another terminal window and use the Curl tool to enter some data from the local 3000 port and view it in the log file to flume. You can see that the data has been saved in the log format to the log file.
Here, we have completed a simple process of fetching data. But we find it troublesome to get into the directory read, so let's just write the data out to the console.
You only need to add the parameter-dflume.root.logger=info,console after the start command.
OK, let's see the results. As you can see, each time we type, the content will be displayed synchronously at the other terminal.
OK, this time it's here. Next time to realize the other agent!! Thank you for reading. I have a limited level, if there is a mistake, please do not hesitate to correct me! Appreciate it!
Data Collection with Apache Flume (i)