Flume of the common sink

Source: Internet
Author: User
Tags bind log4j
1, Logger Sink

Logs that record the info level are typically used for debugging purposes. The sink used in the previous introduction of source are all of this type of sink

Properties that must be configured:
Type Logger
Maxbytestolog Maximum Number of bytes of the Event body to log


Note: You must have a log4j configuration file under the directory specified by the--conf parameter, and you can manually specify the log4j parameter by-dflume.root.logger=info,console when the command starts

Case: The previous examples are all of this type of sink

2. File Roll Sink

Stores events in the local file system. The log information that is collected during this period is saved every time the file is generated for a specified length.

Property Description:
Type File_roll
Sink.directory must fill in the directory where the file is stored
Sink.rollinterval 30 scrolls files every 30 seconds (should be the meaning of cutting data separately to a file every 30 seconds). If set to 0, scrolling is disabled, causing all data to be written to a file.
Sink.serializer TEXT Other possible options include avro_event or the FQCN of a implementation of EVENTSERIALIZER.B Uilder interface.
BatchSize 100

Instance:

A1.sources  =  R1
a1.sinks  =  K1
a1.channels  =  C1

A1.sources.r1.type  = HTTP
a1.sources.r1.port  = 6666
a1.sources.r1.channels  =  C1

a1.channels.c1.type  =  Memory
a1.channels.c1.capacity  =
a1.channels.c1.transactionCapacity  =

a1.sinks.k1.type  = file_roll
a1.sinks.k1.sink.directory =/home/park/work/ Apache-flume-1.6.0-bin/mysink
A1.sinks.k1.channel  =  C1

3, Avro Sink

Very important is the basis for achieving multi-level flow and fan-out flows (1 to many) fan inflow (up to 1). Flume uses Avro RPC to enable multiple flume nodes to connect.

Required attribute Description:
Type Avro
hostname required, he hostname or IP address to bind to.
Port is required, the port # to listen on.

Example 1: Fan-In, multiple nodes A1 ... An inflow to B1

1) Configuration of a1...an:

A1.SOURCES=R1
a1.sinks=k1
a1.channels=c1

a1.sources.r1.type=http
a1.sources.r1.port=8888
A1.SOURCES.R1.CHANNELS=C1

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactioncapacity=1000

A1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.242.138
a1.sinks.k1.port=9988
A1.SINKS.K1.CHANNEL=C1

2) Configuration of B1:

B1.SOURCES=R1
b1.sinks=k1
b1.channels=c1

B1.sources.r1.type=avro
b1.sources.r1.bind= 0.0.0.0
b1.sources.r1.port=9988
b1.sources.r1.channels=c1

b1.channels.c1.type=memory
b1.channels.c1.capacity=1000
b1.channels.c1.transactioncapacity=1000

b1.sinks.k1.type=logger
B1.sinks.k1.channel=c1

After you start flume, you will find that the data is aggregated to the B1 server.
Example 2: fanout operation, 1 nodes A1 out to multiple nodes b1...bn

1) A1 node configuration:

A1.SOURCES=R1
a1.sinks=k1 K2
a1.channels=c1 C2

a1.sources.r1.type=http
a1.sources.r1.port=8888
a1.sources.r1.channels=c1 C2

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactioncapacity=1000
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactioncapacity=1000

A1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.242.138
a1.sinks.k1.port=9988
a1.sinks.k1.channel=c1 
A1.sinks.k2.type=avro
a1.sinks.k2.hostname= 192.168.242.135
a1.sinks.k2.port=9988
a1.sinks.k2.channel=c2
Note: If you want to implement the fan-out of the event data stream, you need to configure multiple channel and sink, if only one channel, multiple sink, then the event data through the channel will be more than sink mutually exclusive consumption.

2) b1...bn configuration:

B1.SOURCES=R1
b1.sinks=k1
b1.channels=c1

#描述/Configure source
B1.sources.r1.type=avro
b1.sources.r1.bind=0.0.0.0
b1.sources.r1.port=9988
b1.sources.r1.channels=c1

b1.channels.c1.type= Memory
b1.channels.c1.capacity=1000
b1.channels.c1.transactioncapacity=1000

#描述Sink
B1.sinks.k1.type=logger
B1.SINKS.K1.CHANNEL=C1
After the flume is started, the event data is divided into two shares flowing through Channel1 and Channel2 after A1, and then distributed to B1...bn


4. HDFS Sink

This sink writes events to the Hadoop Distributed File System HDFs. Previously, it supported the creation of text and serialization files, both of which supported compression. These files can be divided into volumes, based on the specified time or amount of data or the number of events, which also buckets/partitions the data through similar timestamps or machine attributes. The directory path to HDFs can contain a transfer sequence that will be replaced by HDFs to generate a directory/file name for storing events.
Note:Using this sink requires that Hadoop must already be installed so that Flume can communicate with HDFs through the jar package provided by Hadoop, and this version of Hadoop must support sync () calls.

Required attribute Description:
Type HDFS
Hdfs.path required, HDFS directory path (eg hdfs://namenode/flume/webdata/)
Hdfs.fileprefix Flumedata Flume Create a name prefix for the file under directory
hdfs.filesuffix– appended to the name suffix of the file (eg. Avro-NOTE: datetime is not automatically added)
Hdfs.inuseprefix–flume the prefix of the file being processed
Hdfs.inusesuffix. tmp flume the suffix of the file being processed

Instance:

A1.SOURCES=R1
a1.sinks=k1
a1.channels=c1

a1.sources.r1.type=http
a1.sources.r1.port=8888
A1.SOURCES.R1.CHANNELS=C1

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactioncapacity=1000

A1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs:// 0.0.0.0:9000/PPP
A1.SINKS.K1.CHANNEL=C1

Reference: http://www.cnblogs.com/itdyb/p/6270893.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.