===========> create hbase tables and column families first <================
Case 1: One row of source data corresponding to HBase (hbase-1.12 no problem)
================================================================================
#说明: The case is flume listening directory/home/hadoop/flume_hbase capture to HBase; You must first create the table and column families in HBase
Data Catalog:
Vi/home/hadoop/flume_hbase/word.txt
1001 Pan Nan
2200 Lili NV
Create ' tb_words ', ' cf_wd '
VI flume-hbase.conf
#Name The components of the This agent
A1.sources = R1
A1.sinks = K1
A1.channels = C1
#Describe/configure the source
A1.sources.r1.type = spooldir//When monitoring a folder, you do not need to execute the file, you can listen to the message by simply working under the folder
A1.sources.r1.spooldir=/home/hadoop/flume_hbase
# Describe The sink
A1.sinks.k1.type =asynchbase
a1.sinks.k1.table = tb_words
a1.sinks.k1.columnFamily = cf_wd
#目前自己处理到支持一个列名的, multiple column names failed, Multiple column names Consider a regular expression that uses the following case to match the
A1.sinks.k1.serializer.payloadcolumn=wd
a1.sinks.k1.serializer.incrementcolumn= Last
a1.sinks.k1.serializer.rowprefix=qm
A1.sinks.k1.serializer.suffix=timestamp
A1.sinks.k1.serializer =org.apache.flume.sink.hbase.simpleasynchbaseeventserializer
# Use a channel which buffers events in memory
A1.channels.c1.type = Memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
A1.sources.r1.channels = C1
A1.sinks.k1.channel = C1
Case 2: Use regular expressions to divide rows by multiple column values
Description: Apache-flume-1.7.0-bin.tar.gz and hbase-1.12+
================================================================================
Create ' tb_words2 ', ' words '
Data Catalog:
Vi/home/hadoop/flume_hbase/data.txt
1001,panzong,nan
2200,lili,nv
Flume configuration file:
VI flume_2_hbase.conf
#Name the components in this agent
A1.sources = R1
A1.sinks = K1
A1.channels = C1
#Describe/configure The source
A1.sources.r1.type = cn.qm.flume.source.MySource//replaceable to Spooldir
A1.sources.r1.spooldir=/home/hadoop/flume_hbase
# Describe The sink
#a1. Sinks.k1.type =org.apache.flume.sink.hbase.hbasesink
A1.sinks.k1.type =hbase
A1.sinks.k1.table = Tb_words2
a1.sinks.k1.columnFamily = words
a1.sinks.k1.serializer.enablewal= true
A1.sinks.k1.serializer = Org.apache.flume.sink.hbase.RegexHbaseEventSerializer
#查看RegexHbaseEventSerializer类源码, you can quickly understand the Rowkeyindex/colnames property
a1.sinks.k1.serializer.regex= ^ ([0-9]+), ([a-z]+), ([a-z]+) $
# Specify a column to be the primary key, not a randomly generated key, #第一列为Hbase的rowkey
#RegexHbaseEventSerializer Source View
A1.sinks.k1.serializer.rowKeyIndex =0
#ROW_KEY为系统指定列名
a1.sinks.k1.serializer.colnames= Row_key,name,sex
A1.sinks.k1.zookeeperQuorum =hdp-qm-05:2181,hdp-qm-06:2181,hdp-qm-07:2181
# Use a channel which buffers events in memory
A1.channels.c1.type = Memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
A1.sources.r1.channels = C1
A1.sinks.k1.channel = C1
#第二列为Hbase的rowkey
#a1. Sinks.k1.serializer.rowKeyIndex = 1
#a1. sinks.k1.serializer.regex= ^ ([0-9]+), ([a-z]+), ([a-z]+) $
#a1. sinks.k1.serializer.colnames= Id,row_key,sex
Using flume to sink data to HBase