1, Logger Sink
Logs that record the info level are typically used for debugging purposes. The sink used in the previous introduction of source are all of this type of sink
Properties that must be configured:
Type Logger
Maxbytestolog Maximum Number of bytes of the Event body to log
Note: You must have a log4j configuration file under the directory specified by the--conf parameter, and you can manually specify the log4j parameter by-dflume.root.logger=info,console when the command starts
Case: The previous examples are all of this type of sink
2. File Roll Sink
Stores events in the local file system. The log information that is collected during this period is saved every time the file is generated for a specified length.
Property Description:
Type File_roll
Sink.directory must fill in the directory where the file is stored
Sink.rollinterval 30 scrolls files every 30 seconds (should be the meaning of cutting data separately to a file every 30 seconds). If set to 0, scrolling is disabled, causing all data to be written to a file.
Sink.serializer TEXT Other possible options include avro_event or the FQCN of a implementation of EVENTSERIALIZER.B Uilder interface.
BatchSize 100
Instance:
A1.sources = R1
a1.sinks = K1
a1.channels = C1
A1.sources.r1.type = HTTP
a1.sources.r1.port = 6666
a1.sources.r1.channels = C1
a1.channels.c1.type = Memory
a1.channels.c1.capacity =
a1.channels.c1.transactionCapacity =
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory =/home/park/work/ Apache-flume-1.6.0-bin/mysink
A1.sinks.k1.channel = C1
3, Avro Sink
Very important is the basis for achieving multi-level flow and fan-out flows (1 to many) fan inflow (up to 1). Flume uses Avro RPC to enable multiple flume nodes to connect.
Required attribute Description:
Type Avro
hostname required, he hostname or IP address to bind to.
Port is required, the port # to listen on.
Example 1: Fan-In, multiple nodes A1 ... An inflow to B1
1) Configuration of a1...an:
A1.SOURCES=R1
a1.sinks=k1
a1.channels=c1
a1.sources.r1.type=http
a1.sources.r1.port=8888
A1.SOURCES.R1.CHANNELS=C1
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactioncapacity=1000
A1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.242.138
a1.sinks.k1.port=9988
A1.SINKS.K1.CHANNEL=C1
2) Configuration of B1:
B1.SOURCES=R1
b1.sinks=k1
b1.channels=c1
B1.sources.r1.type=avro
b1.sources.r1.bind= 0.0.0.0
b1.sources.r1.port=9988
b1.sources.r1.channels=c1
b1.channels.c1.type=memory
b1.channels.c1.capacity=1000
b1.channels.c1.transactioncapacity=1000
b1.sinks.k1.type=logger
B1.sinks.k1.channel=c1
After you start flume, you will find that the data is aggregated to the B1 server.
Example 2: fanout operation, 1 nodes A1 out to multiple nodes b1...bn
1) A1 node configuration:
A1.SOURCES=R1
a1.sinks=k1 K2
a1.channels=c1 C2
a1.sources.r1.type=http
a1.sources.r1.port=8888
a1.sources.r1.channels=c1 C2
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactioncapacity=1000
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactioncapacity=1000
A1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.242.138
a1.sinks.k1.port=9988
a1.sinks.k1.channel=c1
A1.sinks.k2.type=avro
a1.sinks.k2.hostname= 192.168.242.135
a1.sinks.k2.port=9988
a1.sinks.k2.channel=c2
Note: If you want to implement the fan-out of the event data stream, you need to configure multiple channel and sink, if only one channel, multiple sink, then the event data through the channel will be more than sink mutually exclusive consumption.
2) b1...bn configuration:
B1.SOURCES=R1
b1.sinks=k1
b1.channels=c1
#描述/Configure source
B1.sources.r1.type=avro
b1.sources.r1.bind=0.0.0.0
b1.sources.r1.port=9988
b1.sources.r1.channels=c1
b1.channels.c1.type= Memory
b1.channels.c1.capacity=1000
b1.channels.c1.transactioncapacity=1000
#描述Sink
B1.sinks.k1.type=logger
B1.SINKS.K1.CHANNEL=C1
After the flume is started, the event data is divided into two shares flowing through Channel1 and Channel2 after A1, and then distributed to B1...bn
4. HDFS Sink
This sink writes events to the Hadoop Distributed File System HDFs. Previously, it supported the creation of text and serialization files, both of which supported compression. These files can be divided into volumes, based on the specified time or amount of data or the number of events, which also buckets/partitions the data through similar timestamps or machine attributes. The directory path to HDFs can contain a transfer sequence that will be replaced by HDFs to generate a directory/file name for storing events.
Note:Using this sink requires that Hadoop must already be installed so that Flume can communicate with HDFs through the jar package provided by Hadoop, and this version of Hadoop must support sync () calls.
Required attribute Description:
Type HDFS
Hdfs.path required, HDFS directory path (eg hdfs://namenode/flume/webdata/)
Hdfs.fileprefix Flumedata Flume Create a name prefix for the file under directory
hdfs.filesuffix– appended to the name suffix of the file (eg. Avro-NOTE: datetime is not automatically added)
Hdfs.inuseprefix–flume the prefix of the file being processed
Hdfs.inusesuffix. tmp flume the suffix of the file being processed
Instance:
A1.SOURCES=R1
a1.sinks=k1
a1.channels=c1
a1.sources.r1.type=http
a1.sources.r1.port=8888
A1.SOURCES.R1.CHANNELS=C1
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactioncapacity=1000
A1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs:// 0.0.0.0:9000/PPP
A1.SINKS.K1.CHANNEL=C1
Reference: http://www.cnblogs.com/itdyb/p/6270893.html