Well, I admit it's cool to use Hadoop to handle big data. But sometimes I get frustrated when I do marshalling project. Many times we use a join in a map-reduce task, so the entire job's input may be more than two files (in other words: Mapper to process more than two files).
How to handle multiple inputs with mapper:
Multiple mapper: Each mapper processes the corresponding input file Https://github.com/zhouhao/Hadoop_Project1/blob/master/MapReduceQueries/Query3/query3.java
multipleinputs.addinputpath (conf, new Path (Args[0)), Textinputformat.class, Customermap.class); Multipleinputs.addinputpath (conf, new Path (args[1)), Textinputformat.class, Transactionmap.class); Fileoutputformat.setoutputpath (conf, new Path (args[2));
A mapper: A mapper process all the different files (the following code snippet, within the mapper, we can data from which file, and then processed accordingly)
public static class Map extends Mapreducebase implements Mapper<longwritable, text, text, text> {public void map ( Longwritable key, Text value, outputcollector<text,text> output, Reporter Reporter) throws IOException {//get FileName from reporter Filesplit Filesplit = (filesplit) reporter.getinputsplit (); String filename = Filesplit.getpath (). GetName (); String line = value.tostring (); Output.collect (new Text (filename), value); } }
Ps:mapper input can be a folder: Fileinputformat.setinputpaths (conf, new Path ("/tmp/");
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.