"Flume" Flume in sink to HDFs, file system frequently produce files, file scrolling configuration does not work?

Source: Internet
Author: User

I am testing HDFs sink, found that the sink side of the file scrolling configuration items do not play any role, configured as follows:

a1.sinks.k1.type=hdfsa1.sinks.k1.channel=c1a1.sinks.k1.hdfs.uselocaltimestamp=truea1.sinks.k1.hdfs.path=hdfs:/ / 60a1.sinks.k1.hdfs.rollsize=0a1.sinks.k1.hdfs.rollcount=0a1.sinks.k1.hdfs.idletimeout=0
The configuration here is 60 seconds, the file scroll once, also every 60 seconds, will be a new file "premise, flume source end has data to"

But when I start the flume, run for more than 10 seconds, constantly write data, found the HDFs side of the frequent production of files, every few seconds there is a new file generation
And in the Flume log output can frequently see this sentence:

[WARN] Block Under-replication detected. Rotating file.

As long as you have this, a new file will be created.

It means that the copied block is being scrolled and the source code is viewed:

Private Boolean shouldrotate () {    Boolean dorotate = false;    if (writer.isunderreplicated ()) {      this.isunderreplicated = true;      Dorotate = true;    } else {      this.isunderreplicated = false;    }    if ((Rollcount > 0) && (rollcount <= eventcounter)) {      log.debug ("Rolling:rollcount: {}, Events: {}", RO Llcount, eventcounter);      Dorotate = true;    }    if ((Rollsize > 0) && (rollsize <= processsize)) {      log.debug ("Rolling:rollsize: {}, Bytes: {}", Rollsi Ze, processsize);      Dorotate = true;    }    return dorotate;  }
This is the decision whether to scroll the file, but the first criterion in this is to determine whether the current Hdfswriter is copying the block
public Boolean isunderreplicated () {    try {      int numblocks = Getnumcurrentreplicas ();      if (numblocks = =-1) {        return false;      }      int desiredblocks;      if (Configuredminreplicas! = null) {        desiredblocks = Configuredminreplicas;      } else {        desiredblocks = Getfsdesiredreplication ();      }      return Numblocks < desiredblocks;    } catch (Illegalaccessexception e) {      logger.error ("Unexpected error while checking replication Factor", E);    } catch (InvocationTargetException e) {      logger.error ("Unexpected error while checking replication Factor", E);    } catch (IllegalArgumentException e) {      logger.error ("Unexpected error while checking replication Factor", E);    }    return false;  }
Determine if replication is being replicated by reading the number of configured copies and comparing the blocks currently being copied

if (Shouldrotate ()) {      Boolean dorotate = true;      if (isunderreplicated) {        if (maxconsecunderreplrotations > 0 &&            consecutiveunderreplrotatecount >= maxconsecunderreplrotations) {          dorotate = false;          if (Consecutiveunderreplrotatecount = = maxconsecunderreplrotations) {            log.error ("hit Max consecutive Under-replication rotations ({}); "+                " won't continue rolling files under this path due to "+                " under-replication ", maxconsecunderreplrotations);          }        } else {          Log.warn ("Block under-replication detected. Rotating file. ");        }        consecutiveunderreplrotatecount++;      } else {        Consecutiveunderreplrotatecount = 0;      }
The above method, the entrance is the Shouldrotate () method, that is, if you configure the Rollcount,rollsize greater than 0, will be in accordance with your configuration to scroll, but in the entrance, found, and then to determine whether there is a block in the copy;

In the study took a fixed variable maxconsecunderreplrotations=30, that is, the block being copied, the maximum can be scrolled out of 30 files, if more than 30 times, if the data block is still in the copy, then the data will not scroll, dorotate= False, it won't scroll,so some people find themselves once running for a period of time, there will be 30 files

Then combine the above source code to look at:

If you configure 10 seconds to scroll once, write for 2 seconds, just this time the file contents of the block in the copy, then although not to 10 seconds, will still give you scrolling files, file size, the number of events configured similarly.

In order to solve the above problem, we just let the program not be aware of the writing file block is being copied on the line, how to do it??

Just let the isunderreplicated () method always return False.

This method is compared by the number of copy blocks read in the block and configuration currently being copied, we can change only the number of copies in the configuration item, and the official given flume configuration item


Specify minimum number of replicas per HDFS block. If not specified, it comes from the default Hadoop config in the classpath
The default read is the Dfs.replication attribute in Hadoop, which defaults to 3

We're not going to be here. Configuration in Hadoop, adding the above attribute to flume is 1

The configuration is as follows:

a1.sinks.k1.type=hdfsa1.sinks.k1.channel=c1a1.sinks.k1.hdfs.uselocaltimestamp=truea1.sinks.k1.hdfs.path=hdfs:/ / cmcca1.sinks.k1.hdfs.minblockreplicas=1#a1.sinks.k1.hdfs.filetype=datastream#a1.sinks.k1.hdfs.writeformat= Texta1.sinks.k1.hdfs.rollinterval=60a1.sinks.k1.hdfs.rollsize=0a1.sinks.k1.hdfs.rollcount= 0a1.sinks.k1.hdfs.idletimeout=0
This program will never be because the file is the copy of the block to scroll the file, only according to your configuration items to scroll the file, try it!!

Hope you netizens feel free!!!

"Flume" Flume in sink to HDFs, file system frequently produce files, file scrolling configuration does not work?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.