Mapmap read files in different formats This problem has always been, the previous reading method is to get the name of the file in the map, according to the name of different ways to read, such as the following way
Fetch file name Inputsplit Inputsplit = Context.getinputsplit (); String FileName = ((filesplit) inputsplit). GetPath (). toString (), if (Filename.contains ("track")) {} else if ( Filename.contains ("complain3")) {}
There are two problems in this way, one is to get the name of the file every time the data is read, and the second is to determine according to the name of what format to parse, it is very ugly, in fact, Hadoop provides a way to solve this problem
Use mutipleinputs to solve
public class Mutipleinputstest {private static String complain = "/dsap/rawdata/operate/complain3/";p rivate static String csoperate = "/dsap/rawdata/creditsystemsearchlog/";p rivate static String output = "/dsap/rawdata/ MUTIPLEINPUTSTEST/RESULT1 ";p ublic static class Mapper1 extends Mapper<object, text, text, text>{public void map ( Object key, Text value, Context context) throws IOException, interruptedexception {Counter Counter = cont Ext.getcounter ("MyCounter", "Counter1"); Counter.increment (1l);}} public static class Mapper2 extends Mapper<object, text, text, text>{public void map (Object key, text value, Con Text context) throws IOException, interruptedexception {Counter Counter = Context.getcounter ("MyCounter", "Counter2"); Counter.increment (1l);}} public static void Main (string[] args) throws Exception {configuration conf = new Configuration (); Job Job = new Job (conf, "mutipleinputstest"); Job.setjarbyclass (Mutipleinputstest.class); MultiPleinputs.addinputpath (Job, new Path (complain + "20141217"), Textinputformat.class, Mapper1.class); Multipleinputs.addinputpath (Job, New Path (Csoperate + "20141217"), Textinputformat.class, Mapper2.class); Fileoutputformat.setoutputpath (Job, new Path (output)); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass ( Text.class); Job.setmapoutputkeyclass (Text.class); Job.setmapoutputvalueclass (Text.class); Job.waitForCompletion ( true);/** gets the size of the custom counter, if it equals the size of the centroid, stating that the centroid has not changed, then the program stops iterating */long counter1 = Job.getcounters (). Getgroup ("MyCounter") . Findcounter ("Counter1"). GetValue (); Long counter2 = Job.getcounters (). Getgroup ("MyCounter"). Findcounter ("Counter2 "). GetValue (); System.out.println ("Counter:" + counter1 + "\ T" + Counter2);}}
Look at the results of the operation
You can see that two files of different formats have been processed in two different mapper, so that in two mapper you can parse only one file in a single format.
Hadoop uses mutipleinputs to implement a map to read files in different formats