Mapreduce sets multi-path Input and Output

Source: Internet
Author: User

When I recently wrote a mapreduce program, I want to use multi-path input to input data in Multiple folders at a time. You also want to differentiate the output paths and modify the name of the output file. Check the relevant information, which has been implemented.

  • Multi-path Input

Set the input of mapreduce to data in Multiple folders in HDFS. Just configure it in the main function. The sample code is as follows:

public static void main(String[] args) throws Exception {                String ioPath[] = {                "hdfs://10.1.2.3:8020/user/me/input/folder1",                "hdfs://10.1.2.3:8020/user/me/input/folder2",                "hdfs://10.1.2.3:8020/user/me/output"                        };                Configuration conf = new Configuration();        conf.set("fs.defaultFS", "hdfs://10.1.2.3:8020");        conf.set("mapreduce.jobtracker.address", "10.1.2.3:8021");        Job job = Job.getInstance(conf, "Job-Name");            job.setJarByClass(TestMain.class);        job.setReducerClass(TestReducer.class);        job.setOutputKeyClass(Text.class);        job.setOutputValueClass(Text.class);                FileSystem fs = FileSystem.get(conf);        fs.delete(new Path(ioPath[2]),true);                MultipleInputs.addInputPath(job, new Path(ioPath[0]), TextInputFormat.class, TagUrlMapper.class);        MultipleInputs.addInputPath(job, new Path(ioPath[1]), TextInputFormat.class, TagUrlMapper.class);        FileOutputFormat.setOutputPath(job, new Path(ioPath[2]));                System.exit(job.waitForCompletion(true) ? 0 : 1);      }

Use the multipleinputs. addinputpath () method to add the input path, input type, and mapper class.

  • Multi-path output

Configure the beginning of the multi-path output and output file name in cer Cer. Sample Code:

public class TestReducer extends Reducer<Text, Text, Text, Text> {    private  MultipleOutputs<Text, Text> mos;   @Override    protected void setup(Context context)                    throws IOException, InterruptedException {      super.setup(context);      mos = new MultipleOutputs<Text, Text>(context);    }       @Override   protected void cleanup(Context context)                   throws IOException, InterruptedException {       super.cleanup(context);       mos.close();   }      @Override  public void reduce(Text key, Iterable<Text> values,      Context context)throws IOException, InterruptedException {          while(values.iterator().hasNext()){          tag = values.iterator().next();          if (......){              mos.write(key, new Text("taged"), "taged/taged");          }          else{              mos.write(key, new Text("untaged"), "untaged/untaged");          }      }}

Use the multipleoutputs class to control the output path. Override the setup () and cleanup () Methods of CER, as shown in the sample code.

An example of the output path is as follows:

Mapreduce sets multi-path Input and Output

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.