Hadoop mapreduce vertical table to horizontal table

Source: Internet
Author: User
Tags hadoop mapreduce

Input data is as follows: separated by \ t

0-3 years old parenting encyclopedia book-5 V Liquid Level Sensor 50-5 bearings 20-6 months milk powder-6 months C2C Report-6 months online shopping rankings-6 months milk powder market prospects-6 months formula milk powder 230.001g E tianping 50.01t aluminum furnace 20.01 tons of melting Aluminum Alloy Furnace 20.03 tons of magnesium furnace 250.03 tons of Induction Cooker 11
Here, the left side is the search term and the right side is the category, which can be viewed as the vertical table in the database. Now you need to convert the input to a horizontal table, that is, the class name \ t Statement 1 \ t Statement 2 ..., this format.

Mapreduce is most suitable for doing this. Because it is often used, record it. When the data in the hive table is to be converted into a horizontal table, it is very convenient to write a separate Mr file for processing.

Package seg; import Java. io. ioexception; import Org. apache. hadoop. conf. configuration; import Org. apache. hadoop. conf. configured; import Org. apache. hadoop. FS. filesystem; import Org. apache. hadoop. FS. path; import Org. apache. hadoop. io. longwritable; import Org. apache. hadoop. io. text; import Org. apache. hadoop. mapreduce. job; import Org. apache. hadoop. mapreduce. mapper; import Org. apache. hadoop. mapreduce. reducer; import Org. apache. hadoop. mapreduce. lib. input. fileinputformat; import Org. apache. hadoop. mapreduce. lib. output. fileoutputformat; import Org. apache. hadoop. util. genericoptionsparser; import Org. apache. hadoop. util. tool; import Org. apache. hadoop. util. toolrunner;/*** @ author zhf * @ email [email protected] * @ version Creation Time: 9:56:45, January 1, August 24, 2014 */public class vertical2horizontal extends configured implements tool {Publ IC static void main (string [] ARGs) throws exception {int exitcode = toolrunner. run (New vertical2horizontal (), argS); system. exit (exitcode) ;}@ overridepublic int run (string [] arg0) throws exception {string [] ARGs = new genericoptionsparser (arg0 ). getremainingargs (); If (ARGs. length! = 2) {system. out. println ("Usage: seg. horizontal2vertical <input> <output> "); system. exit (1);} configuration conf = new configuration (); filesystem FS = filesystem. get (CONF); If (FS. exists (New Path (ARGs [1]) FS. delete (New Path (ARGs [1]), true); job = new job (CONF); job. setjarbyclass (getclass (); job. setmapperclass (hvmapper. class); job. setreducerclass (hvreducer. class); job. setmapoutputkeyclass (text. class); job. Setmapoutputvalueclass (text. class); job. setoutputkeyclass (text. class); job. setoutputvalueclass (text. class); fileinputformat. addinputpath (job, new path (ARGs [0]); fileoutputformat. setoutputpath (job, new path (ARGs [1]); Return job. waitforcompletion (true )? ;} Public static class hvmapper extends mapper <longwritable, text, text> {private text = new text (); private text clazz = new text (); public void map (longwritable key, text value, context) throws ioexception, interruptedexception {string line = value. tostring (); string Params [] = line. split ("\ t"); text. set (Params [0]); clazz. set (Params [1]); context. write (clazz, text) ;}} public static class hvreducer extends CER <text, text> {private text result = new text (); public void reduce (Text key, iterable <text> values, context) throws ioexception, interruptedexception {string TMP = ""; for (Text VAL: values) {TMP + = Val + "\ t";} result. set (TMP. trim (); context. write (Key, result );}}}

Output:

1 leshervan clothing aesthetics Laiwu Labor Insurance clothing secret brand of Nanjing down jacket tops crab lingerie store of crab




Hadoop mapreduce vertical table to horizontal table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.