Data Collection with Apache Flume (i)

Source: Internet
Author: User

First introduce Flume is a god horse East. Flume can obtain data from a variety of data sources and then pass it to a different destination path. It is common to use flume to transfer logs to different places, such as collecting logs files from Web server and then transferring them to Hadoop cluster for analysis. Flume configuration is flexible and simple, can achieve different situations of log shipping, is really a good tool.

OK, let's take a look at how to install configuration flume. You can get the latest binary version of Flume from http://flume.apache.org. After downloading, unzip, and then add the Flume Bin directory to path (the default system is already installed in Java, and the Java Bin directory has been added to the path). Finally enter the Flume directory to find the Conf directory, the inside of the flume-env.sh.template copy into flume-env.sh, and then add Java_home to the inside. The installation is now ready.

Let's explain some important characters in Flume, then we configure several agents to play.

There are four important characters in Flume, namely source,channel,sink,event. Source collects the data and creates an event, then passes the event to the specified channel, and the event arrives through the channel Sink,sink is responsible for passing the event's output to the specified target location. In short, it is the source→channel→sink→destination, which transmits the event.

OK, let's take a look at a simple agent. This agent we use to receive network data and write to flume log. The configuration code is as follows:

Agent1.sources = netsource//Specifies that the source type is Netsourceagent1.sinks = logsink//Specifies that the sink type is logsinkagent1.channels = MEMORYCHANNEL//specifies that the channel type is Memorychannel, which is to read the data into memory and then transfer Agent1.sources.netsource.type = netcat// Netsource type Netcatagent1.sources.netsource.bind = localhost//bind the machine Agent1.sources.netsource.port = 3000// The listener port is native 3000agent1.sinks.logsink.type = Logger//logsink is of type Loggeragent1.channels.memorychannel.type = memory// The specified type is memoryagent1.channels.memorychannel.capacity = 1000// Specify data cache Size agent1.channels.memorychannel.transactionCapacity = 100agent1.sources.netsource.channels = memorychannel// Specifies the Channelagent1.sinks.logsink.channel = memorychannel//specified by the source for the received channel of the sink

Name the above code agent1.conf and keep it in Flume's working directory (I'm lazy, no new, just use the downloaded flume directory after downloading).

Then start the agent!. Enter in Terminal: Flume-ng agent--conf conf--conf-file agent1.conf--name agent1 (--conf Specify conf directory,--conf-file specify profile name,-- Name specifies the agent name).

Then open another terminal window and use the Curl tool to enter some data from the local 3000 port and view it in the log file to flume. You can see that the data has been saved in the log format to the log file.

Here, we have completed a simple process of fetching data. But we find it troublesome to get into the directory read, so let's just write the data out to the console.

You only need to add the parameter-dflume.root.logger=info,console after the start command.

OK, let's see the results. As you can see, each time we type, the content will be displayed synchronously at the other terminal.

OK, this time it's here. Next time to realize the other agent!! Thank you for reading. I have a limited level, if there is a mistake, please do not hesitate to correct me! Appreciate it!

Data Collection with Apache Flume (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.