Take a look at the log in the flume& collection directory of the Big Data acquisition engine

Source: Internet
Author: User
Tags sqoop

Welcome to the big Data and AI technical articles released by the public number: Qing Research Academy, where you can learn the night white (author's pen name) carefully organized notes, let us make a little progress every day, so that excellent become a habit!

First, the introduction of flume:

Developed by Cloudera, Flume is a system that provides high availability, high reliability, distributed mass log acquisition, aggregation and transmission, Flume supports the customization of various data senders in the log system for data acquisition, while Flume provides simple processing of data and write to a variety of data receiver's ability, if can summarize flume in a sentence, then Flume is the real-time acquisition log data acquisition engine.

Second, the flume system structure:


Flume architecture is divided into three parts: data source, Flume, destination

There are many types of data sources: from directory, HTTP, Kafka, and so on, Flume provides the source component for collecting data sources.

1. Source function: Collect log

Source Type: 1, spooling directory Source: Collect logs in Directory

2. Htttp Source: Collect logs in HTTP

3. Kafka Source: Collect logs from Kafka

......

The collected logs need to be cached, and Flume provides the channel component to cache the data.

2. Channel function: Cache log

Channel type: 1, Memory Channel: Cache to RAM (most commonly used)

2. JDBC Channel: Cached in the relational database via JDBC

3. Kafka Channel: Cache to Kafka

......

The cached data ultimately needs to be saved, and Flume provides the sink component to hold the data.

3. Sink function: Save Log

Sink species: 1, HDFs sink: saving in HDFs

2. HBase sink: Saving in HBase

3. Hive Sink: Save to Hive

4. Kafka Sink: Save to Kafka

......

The official website has the flume each component different kind of enumeration:

Third, installation and configuration flume:

1. Installation: TAR-ZXVF apache-flume-1.7.0-bin.tar.gz-c ~/training

2. Create a profile a4.conf: Define the agent, define the source, channel, sink, and assemble it to define the conditions for generating the log file.

The following is what is in the a4.conf configuration file, where the data source is defined from the directory, the data is cached in memory, the data is eventually saved to HDFs, and the conditions for generating the log file are defined: The log file size reaches 128M or the log file is generated after 60 seconds.

#定义agent名, name of source, channel, sink

A4.sources = R1

A4.channels = C1

A4.sinks = K1

#具体定义source

A4.sources.r1.type = Spooldir

A4.sources.r1.spoolDir =/root/training/logs

#具体定义channel

A4.channels.c1.type = Memory

A4.channels.c1.capacity = 10000

a4.channels.c1.transactionCapacity = 100

#定义拦截器, add a timestamp to the message

A4.sources.r1.interceptors = I1

A4.sources.r1.interceptors.i1.type = Org.apache.flume.interceptor.timestampinterceptor$builder

#具体定义sink

A4.sinks.k1.type = HDFs

A4.sinks.k1.hdfs.path = hdfs://192.168.157.11:9000/flume/%y%m%d

A4.sinks.k1.hdfs.filePrefix = events-

A4.sinks.k1.hdfs.fileType = DataStream

#不按照条数生成文件

A4.sinks.k1.hdfs.rollCount = 0

#HDFS上的文件达到128M时生成一个日志文件

A4.sinks.k1.hdfs.rollSize = 134217728

#HDFS上的文件达到60秒生成一个日志文件

A4.sinks.k1.hdfs.rollInterval = 60

#组装source, channel, sink

A4.sources.r1.channels = C1

A4.sinks.k1.channel = C1

Iv. using flume statements to collect data:

1. Create a directory to save the log:

Mkdir/root/training/logs

2, start flume, ready to collect logs in real time:

Bin/flume-ng.agent-n a4-f myagent/a4.conf-c Conf-dflume.root.logger=info.console

3. Import the log into the directory:

CP * ~/training/logs

V. Similarities and differences between Sqoop and Flume:

Same point: Sqoop and Flume have only one installation mode, there is no local mode, cluster mode, etc.

Different points: Sqoop Batch data acquisition, flume real-time data acquisition.

Li Jinze Allenli, Tsinghua University in the master's degree, Research direction: Big Data and artificial intelligence

Take a look at the log in the flume& collection directory of the Big Data acquisition engine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.