Flume--Initial knowledge of Flume, source and sink

Last Update:2018-08-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

flume– primary knowledge of Flume, source and sink

Directory
Basic concepts
Common source sources
Common sink

Basic concepts

What's the name flume?
Distributed, reliable, large number of log collection, aggregation, and mobility tools.

? events
event, which is the byte data of a row of data, is the basic unit of Flume sending files.

? flume configuration File
Rename Flume-env.sh.template to flume-env.sh and add [export JAVA_HOME=/SOFT/JDK]

? Agent for Flume
source//where to read the data. Responsible for monitoring and collecting data. Relative to channel is producer.
channel//data channel. Channel, equivalent to the data buffer.
sink//where the data is sent. Sink, which is responsible for placing the data in the specified position. Relative to the channel is the consumer.

How to use flume?
Under Flume conf file, create a conf suffix file and start with the flume command

? flume command
Start: Flume-ng agent-f/soft/flume/conf/example.conf-n A1

Common source sources

? execution Source: Exec sour//through the Linux command as source. Cons: Data is lost after failure, and data integrity is not guaranteed.
#定义源: Exec
A1.source.r1.type = Exec
A1.source.r1.command = Tail-f/home/centos/1.txt
Scroll directory Source: Spooling directory source//monitoring directory, if new files are generated under the directory, the opportunity to consume
#定义源: Spoodir
A1.source.r1.type = Spooldir
#指定监控目录
A1.source.r1.spoolDir =/home/centos/log
? file of the specified type: Taildir source# a file of the specified type in the monitoring directory and monitors its consumption offset;
~/.flume/taildir_position.json monitoring and real-time recording of file offsets can be modified through the A1.sources.r1.positionFile configuration
#定义源: Taildir
A1.source.r1.type = Taildir
#指定监控文件组
a1.source.r1.filegroups = G1
#指定g1组中包含的文件
A1.SOURCE.R1.FILEGROUPS.G1 =/home/centos/log/.*log
-Sequential Digital Source: Sequence Generator source//produces sequential numbers of sources, used as tests
#定义源: Seq
A1.source.r1.type = seq
#定义一次RPC产生的批次数量
A1.source.r1.batchSize = 1024
Pressure Source: Stress source//test cluster pressure as a load test
#定义源: Stress
A1.source.r1.type = Org.apache.flume.source.StressSource
#一个event产生的数据量
A1.source.r1.size = 1073741824

Common sink

? Log & Console: Logger sink
A1.sinks.k1.type = Logger
? stored in local files: File roll Sink
#设置滚动文件sink
A1.sinks.k1.type = File_roll
#指定文件位置. If the file does not exist, it will error
A1.sinks.k1.directory =/home/centos/log2
#设置滚动周期间隔, 0 does not scroll; default 30s.
A1.sinks.k1.sink.rollInterval = 0
Write to Hdfsl:hdfs sink//default Sequencefile, can be specified by Hdfs.filetype (Sequencefile, DataStream or Compressedstream)
#指定类型
A1.sinks.k1.type = HDFs
#指定路径 without creating a separate folder
A1.sinks.k1.hdfs.path =/flume/events/%y-%m-%d/%h
#时间相关的配置, you must specify a timestamp
A1.sinks.k1.hdfs.useLocalTimeStamp = True
#实例化文件的前缀
A1.sinks.k1.hdfs.filePrefix = events-
#滚动间隔, 0 is not scrolling
A1.sinks.k1.hdfs.rollInterval = 0
#滚动大小; default 1024
A1.sinks.k1.hdfs.rollSize = 1024
#指定数据类型; default is Sequencefile
A1.sinks.k1.hdfs.fileType = Compressedstream
#指定压缩编解码器
A1.sinks.k1.hdfs.codeC = gzip
? write to Hbase:hbase sink//need to create TABLE, cannot specify Rowkey and col
#设置类型
A1.sinks.k1.type = HBase
A1.sinks.k1.table = Ns1:flume
A1.sinks.k1.columnFaminly = F1
Write to Hbase:regexhbase sink//you need to create a table, you can manually specify Rowkey and col
#设置正则hbase类型
A1.sinks.k1.type = HBase
A1.sinks.k1.serializer = Org.apache.flume.sink.hbase.RegexHbaseEventSerializer
#手动指定rowkey和列, [Row_key] must be some, and uppercase
A1.sinks.k1.serializer.colNames = Row_key,name,age
#指定正则, corresponding to Col
A1.sinks.k1.serializer.regex = (. *), (. *), (. *), (. *)
#指定rowkey索引
A1.sinks.k1.serializer.rowKeyIndex = 0
A1.sinks.k1.table = Ns1:flume
a1.sinks.k1.coluFamily = F1
Write to hive:hive sink//need to start hive's transactional
# Set Hive sink
A1.sinks.k1.type = Hive
# metastore:hive--service Metastore//metastore source Data Warehouse that needs to start hive
A1.sinks.k1.hive.metastore = thrift://s101:9083
A1.sinks.k1.hive.database = Default
# need to create a transaction table
A1.sinks.k1.hive.table = Tx2
# Specify mappings for Columns and fields
A1.sinks.k1.serializer = Delimited
# Specifies the format of the input, must be double quotation marks
A1.sinks.k1.serializer.delimiter = "\ T"
# Specify how the hive store file will be displayed, must be single quotes
A1.sinks.k1.serializer.serdeSeparator = ' \ t '
A1.sinks.k1.serializer.fieldnames =id,name,age
? write to Hive Supplement
1. First copy all the jars in the/soft/hive/hcatalog/share/hcatalog to the Lib Library of Hive
cp/soft/hive/hcatalog/share/hcatalog/*/soft/hive/lib/
2. Start Hive Metastore
Hive--service Metastore
3. Start hive and create a transaction table
SET hive.support.concurrency = true;
SET hive.enforce.bucketing = true;
SET Hive.exec.dynamic.partition.mode = nonstrict;
SET Hive.txn.manager = Org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET Hive.compactor.initiator.on = true;
SET hive.compactor.worker.threads = 1;
CREATE TABLE TX2 (id int, name string, age int) clustered by (ID) into 2 buckets stored as orc tblproperties (' Transactiona L ' = ' true ');
4, start flume, and use the above configuration file
Flume-ng agent-f k_hive.conf-n A1
5. Input data validation
1tom18

Flume--Initial knowledge of Flume, source and sink

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Flume--Initial knowledge of Flume, source and sink

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support