Hadoop uses mutipleinputs to implement a map to read files in different formats

Source: Internet
Author: User
Tags dsap

Mapmap read files in different formats This problem has always been, the previous reading method is to get the name of the file in the map, according to the name of different ways to read, such as the following way

Fetch file name Inputsplit Inputsplit = Context.getinputsplit (); String FileName = ((filesplit) inputsplit). GetPath (). toString (), if (Filename.contains ("track")) {} else if ( Filename.contains ("complain3")) {}

There are two problems in this way, one is to get the name of the file every time the data is read, and the second is to determine according to the name of what format to parse, it is very ugly, in fact, Hadoop provides a way to solve this problem

Use mutipleinputs to solve

public class Mutipleinputstest {private static String complain = "/dsap/rawdata/operate/complain3/";p rivate static String csoperate = "/dsap/rawdata/creditsystemsearchlog/";p rivate static String output = "/dsap/rawdata/ MUTIPLEINPUTSTEST/RESULT1 ";p ublic static class Mapper1 extends Mapper<object, text, text, text>{public void map ( Object key, Text value, Context context) throws IOException, interruptedexception {Counter Counter = cont Ext.getcounter ("MyCounter", "Counter1"); Counter.increment (1l);}} public static class Mapper2 extends Mapper<object, text, text, text>{public void map (Object key, text value, Con  Text context) throws IOException, interruptedexception {Counter Counter = Context.getcounter ("MyCounter", "Counter2"); Counter.increment (1l);}} public static void Main (string[] args) throws Exception {configuration conf = new Configuration (); Job Job = new Job (conf, "mutipleinputstest"); Job.setjarbyclass (Mutipleinputstest.class); MultiPleinputs.addinputpath (Job, new Path (complain + "20141217"), Textinputformat.class, Mapper1.class); Multipleinputs.addinputpath (Job, New Path (Csoperate + "20141217"), Textinputformat.class, Mapper2.class); Fileoutputformat.setoutputpath (Job, new Path (output)); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass ( Text.class); Job.setmapoutputkeyclass (Text.class); Job.setmapoutputvalueclass (Text.class); Job.waitForCompletion ( true);/** gets the size of the custom counter, if it equals the size of the centroid, stating that the centroid has not changed, then the program stops iterating */long counter1 = Job.getcounters (). Getgroup ("MyCounter") . Findcounter ("Counter1"). GetValue (); Long counter2 = Job.getcounters (). Getgroup ("MyCounter"). Findcounter ("Counter2 "). GetValue (); System.out.println ("Counter:" + counter1 + "\ T" + Counter2);}}

Look at the results of the operation


You can see that two files of different formats have been processed in two different mapper, so that in two mapper you can parse only one file in a single format.

Hadoop uses mutipleinputs to implement a map to read files in different formats

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.