Flume of the common sink

Last Update:2018-07-25 Source: Internet

Author: User

Tags bind log4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, Logger Sink

Logs that record the info level are typically used for debugging purposes. The sink used in the previous introduction of source are all of this type of sink

Properties that must be configured:
Type Logger
Maxbytestolog Maximum Number of bytes of the Event body to log

Note: You must have a log4j configuration file under the directory specified by the--conf parameter, and you can manually specify the log4j parameter by-dflume.root.logger=info,console when the command starts

Case: The previous examples are all of this type of sink

2. File Roll Sink

Stores events in the local file system. The log information that is collected during this period is saved every time the file is generated for a specified length.

Property Description:
Type File_roll
Sink.directory must fill in the directory where the file is stored
Sink.rollinterval 30 scrolls files every 30 seconds (should be the meaning of cutting data separately to a file every 30 seconds). If set to 0, scrolling is disabled, causing all data to be written to a file.
Sink.serializer TEXT Other possible options include avro_event or the FQCN of a implementation of EVENTSERIALIZER.B Uilder interface.
BatchSize 100

Instance:

A1.sources  =  R1
a1.sinks  =  K1
a1.channels  =  C1

A1.sources.r1.type  = HTTP
a1.sources.r1.port  = 6666
a1.sources.r1.channels  =  C1

a1.channels.c1.type  =  Memory
a1.channels.c1.capacity  =
a1.channels.c1.transactionCapacity  =

a1.sinks.k1.type  = file_roll
a1.sinks.k1.sink.directory =/home/park/work/ Apache-flume-1.6.0-bin/mysink
A1.sinks.k1.channel  =  C1

3, Avro Sink

Very important is the basis for achieving multi-level flow and fan-out flows (1 to many) fan inflow (up to 1). Flume uses Avro RPC to enable multiple flume nodes to connect.

Required attribute Description:
Type Avro
hostname required, he hostname or IP address to bind to.
Port is required, the port # to listen on.

Example 1: Fan-In, multiple nodes A1 ... An inflow to B1

1) Configuration of a1...an:

A1.SOURCES=R1
a1.sinks=k1
a1.channels=c1

a1.sources.r1.type=http
a1.sources.r1.port=8888
A1.SOURCES.R1.CHANNELS=C1

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactioncapacity=1000

A1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.242.138
a1.sinks.k1.port=9988
A1.SINKS.K1.CHANNEL=C1

2) Configuration of B1:

B1.SOURCES=R1
b1.sinks=k1
b1.channels=c1

B1.sources.r1.type=avro
b1.sources.r1.bind= 0.0.0.0
b1.sources.r1.port=9988
b1.sources.r1.channels=c1

b1.channels.c1.type=memory
b1.channels.c1.capacity=1000
b1.channels.c1.transactioncapacity=1000

b1.sinks.k1.type=logger
B1.sinks.k1.channel=c1

After you start flume, you will find that the data is aggregated to the B1 server.
Example 2: fanout operation, 1 nodes A1 out to multiple nodes b1...bn

1) A1 node configuration:

A1.SOURCES=R1
a1.sinks=k1 K2
a1.channels=c1 C2

a1.sources.r1.type=http
a1.sources.r1.port=8888
a1.sources.r1.channels=c1 C2

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactioncapacity=1000
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactioncapacity=1000

A1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.242.138
a1.sinks.k1.port=9988
a1.sinks.k1.channel=c1 
A1.sinks.k2.type=avro
a1.sinks.k2.hostname= 192.168.242.135
a1.sinks.k2.port=9988
a1.sinks.k2.channel=c2

Note: If you want to implement the fan-out of the event data stream, you need to configure multiple channel and sink, if only one channel, multiple sink, then the event data through the channel will be more than sink mutually exclusive consumption.

2) b1...bn configuration:

B1.SOURCES=R1
b1.sinks=k1
b1.channels=c1

#描述/Configure source
B1.sources.r1.type=avro
b1.sources.r1.bind=0.0.0.0
b1.sources.r1.port=9988
b1.sources.r1.channels=c1

b1.channels.c1.type= Memory
b1.channels.c1.capacity=1000
b1.channels.c1.transactioncapacity=1000

#描述Sink
B1.sinks.k1.type=logger
B1.SINKS.K1.CHANNEL=C1

After the flume is started, the event data is divided into two shares flowing through Channel1 and Channel2 after A1, and then distributed to B1...bn

4. HDFS Sink

This sink writes events to the Hadoop Distributed File System HDFs. Previously, it supported the creation of text and serialization files, both of which supported compression. These files can be divided into volumes, based on the specified time or amount of data or the number of events, which also buckets/partitions the data through similar timestamps or machine attributes. The directory path to HDFs can contain a transfer sequence that will be replaced by HDFs to generate a directory/file name for storing events.
Note:Using this sink requires that Hadoop must already be installed so that Flume can communicate with HDFs through the jar package provided by Hadoop, and this version of Hadoop must support sync () calls.

Required attribute Description:
Type HDFS
Hdfs.path required, HDFS directory path (eg hdfs://namenode/flume/webdata/)
Hdfs.fileprefix Flumedata Flume Create a name prefix for the file under directory
hdfs.filesuffix– appended to the name suffix of the file (eg. Avro-NOTE: datetime is not automatically added)
Hdfs.inuseprefix–flume the prefix of the file being processed
Hdfs.inusesuffix. tmp flume the suffix of the file being processed

Instance:

A1.SOURCES=R1
a1.sinks=k1
a1.channels=c1

a1.sources.r1.type=http
a1.sources.r1.port=8888
A1.SOURCES.R1.CHANNELS=C1

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactioncapacity=1000

A1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs:// 0.0.0.0:9000/PPP
A1.SINKS.K1.CHANNEL=C1

Reference: http://www.cnblogs.com/itdyb/p/6270893.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More