Hadoop gets the file name of input file inside mapper

Source: Internet
Author: User
Keywords Hadoop mapreduce hadoop architecture Hopfield large-scale data associative memory
Tags big data class data file files get hadoop how to

Well, I admit it's cool to use Hadoop to handle big data. But sometimes I get frustrated when I do marshalling project.
Many times we use a join in a map-reduce task, so the entire job's input may be more than two files (in other words: Mapper to process more than two files).

How to handle multiple inputs with mapper:

Multiple mapper: Each mapper processes the corresponding input file Https://github.com/zhouhao/Hadoop_Project1/blob/master/MapReduceQueries/Query3/query3.java

multipleinputs.addinputpath (conf, new Path (Args[0)), Textinputformat.class, Customermap.class); Multipleinputs.addinputpath (conf, new Path (args[1)), Textinputformat.class, Transactionmap.class); Fileoutputformat.setoutputpath (conf, new Path (args[2));

A mapper: A mapper process all the different files (the following code snippet, within the mapper, we can data from which file, and then processed accordingly)

public static class Map extends Mapreducebase implements Mapper<longwritable, text, text, text> {public void map ( Longwritable key, Text value, outputcollector<text,text> output, Reporter Reporter) throws IOException {//get FileName from reporter Filesplit Filesplit = (filesplit) reporter.getinputsplit (); String filename = Filesplit.getpath (). GetName (); String line = value.tostring (); Output.collect (new Text (filename), value); } }

Ps:mapper input can be a folder: Fileinputformat.setinputpaths (conf, new Path ("/tmp/");

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.