HDFs Merge Results and HDFs internal copy

Source: Internet
Author: User
Tags file copy

1. Problem: When the input of a mapreduce program is a lot of mapreduce output, since input defaults to only one path, these files need to be merged into a single file. This function copymerge is provided in Hadoop.

The function is implemented as follows:

	public void Copymerge (string folder, string file) {

		path src = new Path (folder);
		Path DST = new path (file);
		Configuration conf = new configuration ();
		try {
			Fileutil.copymerge (src.getfilesystem (conf), SRC,
					dst.getfilesystem (conf), DST, false, conf, null);
		} catch (IOException e) {
			//TODO auto-generated catch block
			e.printstacktrace ();
		}
	}

This can combine all the part-r-0000* in the folder subdirectory into a single file, which is equivalent to another mapreduce, and the output file name is DST set by itself.


2. In the project, there are times when the HDFs internal file copy is encountered, and this function is no longer in Hadoop filesystem. It was painful to find the API at the time, and later found it in the fileutil.

The function is implemented as follows:

	public void Copy (String srcfile, String desfile) {

		Path src = new Path (srcfile);
		Path DST = new path (desfile);
		Configuration conf = new configuration ();
		try {
			fileutil.copy (src.getfilesystem (conf), SRC,
					dst.getfilesystem (conf), DST, false, conf);
		} catch ( IOException e) {
			//TODO auto-generated catch block
			e.printstacktrace ();
		}
	}


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.