Input data is as follows: separated by \ t
0-3 years old parenting encyclopedia book-5 V Liquid Level Sensor 50-5 bearings 20-6 months milk powder-6 months C2C Report-6 months online shopping rankings-6 months milk powder market prospects-6 months formula milk powder 230.001g E tianping 50.01t aluminum furnace 20.01 tons of melting Aluminum Alloy Furnace 20.03 tons of magnesium furnace 250.03 tons of Induction Cooker 11
Here, the left side is the search term and the right side is the category, which can be viewed as the vertical table in the database. Now you need to convert the input to a horizontal table, that is, the class name \ t Statement 1 \ t Statement 2 ..., this format.
Mapreduce is most suitable for doing this. Because it is often used, record it. When the data in the hive table is to be converted into a horizontal table, it is very convenient to write a separate Mr file for processing.
Package seg; import Java. io. ioexception; import Org. apache. hadoop. conf. configuration; import Org. apache. hadoop. conf. configured; import Org. apache. hadoop. FS. filesystem; import Org. apache. hadoop. FS. path; import Org. apache. hadoop. io. longwritable; import Org. apache. hadoop. io. text; import Org. apache. hadoop. mapreduce. job; import Org. apache. hadoop. mapreduce. mapper; import Org. apache. hadoop. mapreduce. reducer; import Org. apache. hadoop. mapreduce. lib. input. fileinputformat; import Org. apache. hadoop. mapreduce. lib. output. fileoutputformat; import Org. apache. hadoop. util. genericoptionsparser; import Org. apache. hadoop. util. tool; import Org. apache. hadoop. util. toolrunner;/*** @ author zhf * @ email [email protected] * @ version Creation Time: 9:56:45, January 1, August 24, 2014 */public class vertical2horizontal extends configured implements tool {Publ IC static void main (string [] ARGs) throws exception {int exitcode = toolrunner. run (New vertical2horizontal (), argS); system. exit (exitcode) ;}@ overridepublic int run (string [] arg0) throws exception {string [] ARGs = new genericoptionsparser (arg0 ). getremainingargs (); If (ARGs. length! = 2) {system. out. println ("Usage: seg. horizontal2vertical <input> <output> "); system. exit (1);} configuration conf = new configuration (); filesystem FS = filesystem. get (CONF); If (FS. exists (New Path (ARGs [1]) FS. delete (New Path (ARGs [1]), true); job = new job (CONF); job. setjarbyclass (getclass (); job. setmapperclass (hvmapper. class); job. setreducerclass (hvreducer. class); job. setmapoutputkeyclass (text. class); job. Setmapoutputvalueclass (text. class); job. setoutputkeyclass (text. class); job. setoutputvalueclass (text. class); fileinputformat. addinputpath (job, new path (ARGs [0]); fileoutputformat. setoutputpath (job, new path (ARGs [1]); Return job. waitforcompletion (true )? ;} Public static class hvmapper extends mapper <longwritable, text, text> {private text = new text (); private text clazz = new text (); public void map (longwritable key, text value, context) throws ioexception, interruptedexception {string line = value. tostring (); string Params [] = line. split ("\ t"); text. set (Params [0]); clazz. set (Params [1]); context. write (clazz, text) ;}} public static class hvreducer extends CER <text, text> {private text result = new text (); public void reduce (Text key, iterable <text> values, context) throws ioexception, interruptedexception {string TMP = ""; for (Text VAL: values) {TMP + = Val + "\ t";} result. set (TMP. trim (); context. write (Key, result );}}}
Output:
1 leshervan clothing aesthetics Laiwu Labor Insurance clothing secret brand of Nanjing down jacket tops crab lingerie store of crab
Hadoop mapreduce vertical table to horizontal table