When I recently wrote a mapreduce program, I want to use multi-path input to input data in Multiple folders at a time. You also want to differentiate the output paths and modify the name of the output file. Check the relevant information, which has been implemented.
Set the input of mapreduce to data in Multiple folders in HDFS. Just configure it in the main function. The sample code is as follows:
public static void main(String[] args) throws Exception { String ioPath[] = { "hdfs://10.1.2.3:8020/user/me/input/folder1", "hdfs://10.1.2.3:8020/user/me/input/folder2", "hdfs://10.1.2.3:8020/user/me/output" }; Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://10.1.2.3:8020"); conf.set("mapreduce.jobtracker.address", "10.1.2.3:8021"); Job job = Job.getInstance(conf, "Job-Name"); job.setJarByClass(TestMain.class); job.setReducerClass(TestReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileSystem fs = FileSystem.get(conf); fs.delete(new Path(ioPath[2]),true); MultipleInputs.addInputPath(job, new Path(ioPath[0]), TextInputFormat.class, TagUrlMapper.class); MultipleInputs.addInputPath(job, new Path(ioPath[1]), TextInputFormat.class, TagUrlMapper.class); FileOutputFormat.setOutputPath(job, new Path(ioPath[2])); System.exit(job.waitForCompletion(true) ? 0 : 1); }
Use the multipleinputs. addinputpath () method to add the input path, input type, and mapper class.
Configure the beginning of the multi-path output and output file name in cer Cer. Sample Code:
public class TestReducer extends Reducer<Text, Text, Text, Text> { private MultipleOutputs<Text, Text> mos; @Override protected void setup(Context context) throws IOException, InterruptedException { super.setup(context); mos = new MultipleOutputs<Text, Text>(context); } @Override protected void cleanup(Context context) throws IOException, InterruptedException { super.cleanup(context); mos.close(); } @Override public void reduce(Text key, Iterable<Text> values, Context context)throws IOException, InterruptedException { while(values.iterator().hasNext()){ tag = values.iterator().next(); if (......){ mos.write(key, new Text("taged"), "taged/taged"); } else{ mos.write(key, new Text("untaged"), "untaged/untaged"); } }}
Use the multipleoutputs class to control the output path. Override the setup () and cleanup () Methods of CER, as shown in the sample code.
An example of the output path is as follows:
Mapreduce sets multi-path Input and Output