1. Problem: When the input of a mapreduce program is a lot of mapreduce output, since input defaults to only one path, these files need to be merged into a single file. This function copymerge is provided in Hadoop.
The function is implemented as follows:
public void Copymerge (string folder, string file) {
path src = new Path (folder);
Path DST = new path (file);
Configuration conf = new configuration ();
try {
Fileutil.copymerge (src.getfilesystem (conf), SRC,
dst.getfilesystem (conf), DST, false, conf, null);
} catch (IOException e) {
//TODO auto-generated catch block
e.printstacktrace ();
}
}
This can combine all the part-r-0000* in the folder subdirectory into a single file, which is equivalent to another mapreduce, and the output file name is DST set by itself.
2. In the project, there are times when the HDFs internal file copy is encountered, and this function is no longer in Hadoop filesystem. It was painful to find the API at the time, and later found it in the fileutil.
The function is implemented as follows:
public void Copy (String srcfile, String desfile) {
Path src = new Path (srcfile);
Path DST = new path (desfile);
Configuration conf = new configuration ();
try {
fileutil.copy (src.getfilesystem (conf), SRC,
dst.getfilesystem (conf), DST, false, conf);
} catch ( IOException e) {
//TODO auto-generated catch block
e.printstacktrace ();
}
}