Hdfssink Configuration in Flume

Source: Internet
Author: User
Keywords Cloud computing Flume hdfssink Configuration tutorial

Description of HDFs sink configuration parameters in flume.

Official Configuration URL: Http://flume.apache.org/FlumeUserGuide.html#hdfs-sink

Type:hdfs

Path:hdfs path, you need to include file system identification, such as: hdfs://namenode/flume/flumedata/

Fileprefix: Default value: Flumedata, write HDFs filename prefix

Filesuffix: Write HDFs filename suffix, such as:. Lzo. Log, etc.

Inuseprefix: Temporary file filename prefix, HDFs sink will first write temporary files to the target directory, and then rename the final target file according to the relevant rules;

Inusesuffix: Default value:. tmp, file name suffix for temporary files.

Rollinterval: Default value: 30:hdfs sink interval How long to scroll the temporary file into the final target file, in seconds;

If set to 0, the file is not scrolled based on time;

Note: scrolling (roll) means that HDFs sink renames the temporary file to the final destination file and opens a new temporary file to write the data;

Rollsize: Default: 1024: When the temporary file reaches how many (unit: bytes), scrolling into the target file, if set to 0, it means not to the temporary file size to scroll the file;

Rollcount: Default: 10: When the events data reaches that number, the temporary file is scrolled into the target file; If set to 0, it means that the file is not scrolled according to events data;

IdleTimeout: Default value: 0: When the temporary file currently open is not written in the time specified by this parameter, the temporary file is closed and renamed to the destination file;

BatchSize: Default value: 100: Number of events per batch refreshed to HDFS;

CodeC: File compression format, including: gzip, bzip2, Lzo, Lzop, snappy

FileType: Default value: Sequencefile, file format, including: Sequencefile, Datastream,compressedstream

When using datastream, files will not be compressed, no need to set hdfs.codec;

When using Compressedstream, a correct HDFS.CODEC value must be set;

Maxopenfiles: Default: 5000: The maximum number of HDFs files allowed to open, when the number of open files reached this value, the first open file will be closed;

Minblockreplicas: Default value: HDFS number of copies, write the minimum number of copies of the HDFS file block.

This parameter affects the scrolling configuration of the file, which is typically configured to 1 before you can scroll through the file correctly as configured.

Writeformat: Write sequence file format. Contains: Text, writable (default)

Calltimeout: Default value: 10000, the timeout time (in milliseconds) to perform the HDFs operation;

Threadspoolsize: Default value: The number of threads HDFs the 10,HDFS sink initiated operation.

Rolltimerpoolsize: Default value: 1,hdfs sink the number of threads on which to scroll files based on time.

KERBEROSPRINCIPAL:HDFS Security authentication Kerberos configuration;

KERBEROSKEYTAB:HDFS Security authentication Kerberos configuration;

Proxyuser: Proxy User

Round: Default value: False, whether to enable "discard" on time;

Roundvalue: Default value: 1, the time to "discard" the value;

Roundunit: Default: Seconds, "discard" units in time, including: Second,minute,hour

Example:

A1.sinks.k1.hdfs.path =/flume/events/%y-%m-%d/%h%m/%s

A1.sinks.k1.hdfs.round = True

A1.sinks.k1.hdfs.roundValue = 10

A1.sinks.k1.hdfs.roundUnit = Minute

When the time is 2015-10-16 17:38:59, Hdfs.path will still be resolved as:/flume/events/20151016/17:30/00

Because the setting is to discard the time within 10 minutes, the directory is reborn every 10 minutes.

TimeZone: Default value: local time, TimeZone.

Uselocaltimestamp: Default value: flase, whether local time is used.

Closetries: Default value: 0,hdfs sink the number of attempts to close the file;

If set to 1, HDFs sink will not attempt to close the file again after a failed shutdown, and the open file will remain there and be turned on.

Set to 0, after a shutdown failure, HDFs Sink will continue to try the next shutdown until it succeeds.

RetryInterval: Default: 180 (SEC), HDFs Sink attempts to close the file interval, if set to 0, means no attempt, the equivalent of setting Hdfs.closetries to 1.

Serializer: Default value: TEXT, serialization type.

Like what:

Agent1.sinks.sink1.type = HDFs

Agent1.sinks.sink1.hdfs.path = hdfs://cdh5/tmp/data/%y%m%d

Agent1.sinks.sink1.hdfs.filePrefix = log_%y%m%d_%h

Agent1.sinks.sink1.hdfs.fileSuffix =. Lzo

Agent1.sinks.sink1.hdfs.useLocalTimeStamp = True

Agent1.sinks.sink1.hdfs.writeFormat = Text

Agent1.sinks.sink1.hdfs.fileType = Compressedstream

Agent1.sinks.sink1.hdfs.rollCount = 0

Agent1.sinks.sink1.hdfs.rollSize = 0

Agent1.sinks.sink1.hdfs.rollInterval = 600

Agent1.sinks.sink1.hdfs.codeC = Lzop

agent1.sinks.sink1.hdfs.batchSize = 100

Agent1.sinks.sink1.hdfs.threadsPoolSize = 10

Agent1.sinks.sink1.hdfs.idleTimeout = 0

Agent1.sinks.sink1.hdfs.minBlockReplicas = 1

In the above configuration, in the HDFS/tmp/data/directory, a daily format of 20151016 directory, the target file every 10 minutes to generate one, the target file format: log_20151016_13.1444973768543. Lzo target files are Lzo compressed.

Attach the telnet.py code for the test:

Import time

Import Telnetlib

If __name__ = ' __main__ ':

host = ' localhost '

Port = 44444

TN = Telnetlib. Telnet (Host=host,port=port)

For I in range (10000):

Print I

Tn.write (str (i) + ' \ n ')

Time.sleep (1)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.