"Basic Hadoop Tutorial" 7, one of Hadoop for multi-correlated queries

Source: Internet
Author: User
Tags hadoop fs

We all know that an address has a number of companies, this case will be two types of input files: address classes (addresses) and company class (companies) to do a one-to-many association query, get address name (for example: Beijing) and company name (for example: Beijing JD, Beijing Associated information for Red Star).

Development environment

Hardware environment: Centos 6.5 server 4 (one for master node, three for slave node)
Software Environment: Java 1.7.0_45, hadoop-1.2.1

1. Map process

First, the input file is processed using the default Textinputformat class, getting the offset of each line in the text and its contents and depositing < key,value> such as <0, "1:beijing" >. The map process first processes the input information according to the type of the input file, for example, the value ("1:beijing") is processed into < "1" for the address type input file, "address:beijing"; For the company type input file, process value value ("Beijing Red Star:1") to < "1", "company:beijing red Star";,:

Map end Core code implementation as follows, detailed source please refer to: Companyjoinaddress\src\com\zonesion\tablejoin\companyjoinaddress.java.

public static class Mapclass extends Mapper<longwritable, text, text, text>{@Override protected void Map (Long Writable key, Text Value,context Context) throws IOException, interruptedexception {text Addressid = NE        W Text ();        Text info = new text (); String[] line = Value.tostring (). Split (":");//gets each row of data for the file, and splits the String path = ((Filesplit) context.getinputsplit with ":" (        ). GetPath (). toString ();        if (Line.length < 2) {return; if (Path.indexof ("Company") >= 0) {//process the company file's value information: "Beijing Red star:1" Addressid.set (line[1 ]);//"1" Info.set ("Company" + ":" + line[0]);//"Company:beijing Red Star" Context.write (addressid,in FO);//<key,value>--< "1", "company:beijing Red Star" >} else if (Path.indexof ("address") >= 0) {//Processing a Value information for the dress file: "1:beijing" Addressid.set (Line[0]),//"1" info.set ("Address" + ":" + line[1]);//"Addr  Ess:beijing "          Context.write (addressid,info);//<key,value>--< "1", "Address:beijing" >}} 
2. Reduce process

The reduce process first inputs < key,values> i.e. < "1", ["Company:beijing Red Star", "Company:beijing JD", "address:beijing"]> Values are traversed to get to the cell information value (e.g. "Company:beijing Red Star"), The company and address names are then credited to the companies and addresses collections, respectively, based on the identifiers in Finally, we get the relationship between company and address in the Cartesian product operation of company set and address set, and output.

Reduce the core code implementation as follows, detailed source please refer to: Companyjoinaddress\src\com\zonesion\tablejoin\companyjoinaddress.java.

public static class Reduceclass extends Reducer<text, text, text, text>{@Override protected void reduce (text Key, Iterable<text> Values,context Context) throws IOException, interruptedexception {list<str        Ing> companys = new arraylist<string> (); list<string> addresses = new arraylist<string> ()//["company:beijing Red Star", "Company:beijing JD", "        Address:beijing "] iterator<text> it = values.iterator (); while (It.hasnext ()) {String value = It.next (). toString ();//"Company:beijing Red Star" string[] Resul            t = Value.split (":");                if (result.length >= 2) {if (Result[0].equals ("Company")) {Companys.add (result[1]);                }else if (result[0].equals ("address")) {Addresses.add (result[1]); }}}//Seek Cartesian product if (0! = companys.size () && 0! = addresses.size ()) {for (iNT I=0;i<companys.size (); i++) {for (int j=0;j<addresses.size (); j + +) {Context.write (                New Text (Companys.get (i)), new text (Addresses.get (j)));//<key,value>--< "Beijing JD", "Beijing" > }            }        }    }}
3. Drive implementation

Drive core code implementation as follows, detailed source please refer to: Companyjoinaddress\src\com\zonesion\tablejoin\companyjoinaddress.java.

public static void Main (string[] args) throws Exception {configuration conf = new Configuration ();        string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); if (otherargs.length! = 3) {System.err.println ("Usage:company Join address <companyTableDir> <addres            Stabledir> <out> ");        System.exit (2);        Job Job = new Job (conf, "Company Join address");        Set Job Entry class Job.setjarbyclass (Companyjoinaddress.class);        Set map and reduce processing class job.setmapperclass (Mapclass.class);        Job.setreducerclass (Reduceclass.class);        Set Output Type Job.setoutputkeyclass (Text.class);        Job.setoutputvalueclass (Text.class); Set input and Output directory Fileinputformat.addinputpath (Job, New Path (Otherargs[0]));//companytabledir Fileinputformat.add InputPath (Job, New Path (Otherargs[1])),//addresstabledir fileoutputformat.setoutputpath (Job, New Path (otherargs[2]) );//out SysTem.exit (Job.waitforcompletion (true)? 0:1); }
4. Deployment Run 1) Start the Hadoop cluster
[[email protected] ~]$ start-dfs.sh[[email protected] ~]$ start-mapred.sh[[email protected] ~]$ jps5283 SecondaryNameNode5445 JobTracker5578 Jps5109 NameNode
2) Deploy the source code
#设置工作环境[[email protected] ~]$ mkdir -p /usr/hadoop/workspace/MapReduce#部署源码将CompanyJoinAddress文件夹拷贝到/usr/hadoop/workspace/MapReduce/ 路径下;

... You can download companyjoinaddress directly

3) Compiling files
#切换工作目录[[email protected] ~]$ cd /usr/hadoop/workspace/MapReduce/CompanyJoinAddress#编译文件[[email protected] CompanyJoinAddress]$ javac -classpath /usr/hadoop/hadoop-core-1.2.1.jar:/usr/hadoop/lib/commons-cli-1.2.jar -d bin src/com/zonesion/tablejoin/CompanyJoinAddress.java [[email protected] CompanyJoinAddress]$ ls bin/com/zonesion/tablejoin/* -la-rw-rw-r-- 1 hadoop hadoop 1909 8月   1 10:29 bin/com/zonesion/tablejoin/CompanyJoinAddress.class-rw-rw-r-- 1 hadoop hadoop 2199 8月  1 10:29 bin/com/zonesion/tablejoin/CompanyJoinAddress$MapClass.class-rw-rw-r-- 1 hadoop hadoop 2242 8月   1 10:29 bin/com/zonesion/tablejoin/CompanyJoinAddress$ReduceClass.class
4) Packaging jar files
[[email protected] CompanyJoinAddress]$ jar -cvf CompanyJoinAddress.jar -C bin/ .added manifestadding: com/(in = 0) (out= 0)(stored 0%)adding: com/zonesion/(in = 0) (out= 0)(stored 0%)adding: com/zonesion/tablejoin/(in = 0) (out= 0)(stored 0%)adding: com/zonesion/tablejoin/CompanyJoinAddress$MapClass.class(in = 2273) (out= 951)(deflated 58%)adding: com/zonesion/tablejoin/CompanyJoinAddress$ReduceClass.class(in = 2242) (out= 1029)(deflated 54%)adding: com/zonesion/tablejoin/CompanyJoinAddress.class(in = 1909) (out= 983)(deflated 48%)
5) Upload input file
#创建company输入文件夹[[email protected] CompanyJoinAddress]$ hadoop fs -mkdir CompanyJoinAddress/input/company/#创建address输入文件夹[[email protected] CompanyJoinAddress]$ hadoop fs -mkdir CompanyJoinAddress/input/address/#上传文件到company输入文件夹[[email protected] CompanyJoinAddress]$ hadoop fs -put input/company* CompanyJoinAddress/input/company/#上传文件到address输入文件夹[[email protected] CompanyJoinAddress]$ hadoop fs -put input/address* CompanyJoinAddress/input/address/
6) Run the jar file
[[email protected] CompanyJoinAddress]$ hadoop jar CompanyJoinAddress.jar com.zonesion.tablejoin.CompanyJoinAddress CompanyJoinAddress/input/company/  CompanyJoinAddress/input/address/ CompanyJoinAddress/output14/08/01 10:50:05 INFO input.FileInputFormat: Total input paths to process : 414/08/01 10:50:05 INFO util.NativeCodeLoader: Loaded the native-hadoop library14/08/01 10:50:05 WARN snappy.LoadSnappy: Snappy native library not loaded14/08/01 10:50:05 INFO mapred.JobClient: Running job: job_201408010921_000814/08/01 10:50:06 INFO mapred.JobClient:  map 0% reduce 0%14/08/01 10:50:09 INFO mapred.JobClient:  map 50% reduce 0%14/08/01 10:50:10 INFO mapred.JobClient:  map 100% reduce 0%14/08/01 10:50:17 INFO mapred.JobClient:  map 100% reduce 100%14/08/01 10:50:17 INFO mapred.JobClient: Job complete: job_201408010921_000814/08/01 10:50:17 INFO mapred.JobClient: Counters: 29......
7) View the results of the output
[[email protected] CompanyJoinAddress]$ hadoop fs -ls CompanyJoinAddress/outputFound 3 items-rw-r--r--   1 hadoop supergroup 0 2014-08-01 10:50 /user/hadoop/CompanyJoinAddress/output/_SUCCESSdrwxr-xr-x   - hadoop supergroup  0 2014-08-01 10:50 /user/hadoop/CompanyJoinAddress/output/_logs-rw-r--r-- 1 hadoop supergroup 241 2014-08-01 10:50 /user/hadoop/CompanyJoinAddress/output/part-r-00000[[email protected] CompanyJoinAddress]$ hadoop fs -cat CompanyJoinAddress/output/part-r-00000Beijing Red Star        BeijingBeijing Rising          BeijingBack of Beijing     BeijingBeijing JD          Beijingxiaomi              BeijingGuangzhou Honda     GuangzhouGuangzhou Development Bank  GuangzhouShenzhen Thunder        ShenzhenTencent             Shenzhenaiplay              hangzhouhuawei              wuhan
You may like

"Basic Hadoop Tutorial" 5, Word count for Hadoop
"Basic Hadoop Tutorial" 6, Hadoop single-table association query
"Basic Hadoop Tutorial" 7, one of Hadoop for multi-correlated queries
"Basic Hadoop Tutorial" 8, one of Hadoop for multi-correlated queries
Hadoop Essentials Tutorial 9, the inverted index of Hadoop

"Basic Hadoop Tutorial" 7, one of Hadoop for multi-correlated queries

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.