MAVEN Package Hadoop project (with third-party jar)

Source: Internet
Author: User
Tags cassandra

MAVEN Package Hadoop project (with third-party jar)

Issue background:

1 Write the Map-reduce program, use the third-party jar, how to package and submit the project to the server execution.

2 mahout in the itembased algorithm, the UID is mapped from string to long.

The specific features I've implemented here are:

The data format for the Mahout itembased algorithm is: Uid,vid,score. Where UID and vid must be numeric (long), score is a decimal integer can be.

However, I have a field uid,vid,score for each row of records here,

The UID contains letters. So I have to map the UID from string to long.

With the speed in mind, a distributed program is used to do this conversion.

In addition, a direct call to the mahout inside a class

Org.apache.mahout.cf.taste.impl.model.MemoryIDMigrator


Create a standardized Java project with Maven

MVN archetype:generate-darchetypegroupid=org.apache.maven.archetypes-dgroupid=org.linger.mahout-dartifactid= Mahoutproject-dpackagename=org.linger.mahout-dversion=1.0-dinteractivemode=false

Perform the MVN clean install initialization project, and note that a Pom.xml file is automatically generated.

Modify Pom.xml,

1 Get rid of JUnit first.

2 Add mahout dependency jar in Pom.xml (here don't study mahout how these jar dependencies come out)

         <properties> &LT;PROJECT.BUILD.SOURCEENCODING&GT;UTF-8&LT;/PROJECT.BUILD.SOURCEENCODING&G                   T                   <mahout.version>0.8</mahout.version> </properties> <dependencies>                            <dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-core</artifactId> <version>${mahout.version}</versio n> </dependency> <dependency> <groupid&gt                            ;org.apache.mahout</groupid> <artifactId>mahout-integration</artifactId>                                     <version>${mahout.version}</version> <exclusions> <exclusion> <groupid>org.mortbay . JeTty</groupid> <artifactId>jetty</artifactId>                                               </exclusion> <exclusion> <groupId>org.apache.cassandra</groupId> <                                     Artifactid>cassandra-all</artifactid> </exclusion> <exclusion> <groupid>me.prettyprint</groupid&                                               Gt                            <artifactId>hector-core</artifactId> </exclusion> </exclusions> </dependency> </dependencies>



3 Configuring jar Packaging options in pom.xml

<build> <plugins> <plugin> <artifactid>maven-assembly-plugin</artifacti                             d> <configuration> <archive> <manifest>                         <mainClass>org.linger.mahout.mapreducer.UserVideoFormat</mainClass>                        </manifest> </archive> <descriptorRefs>               <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> &lt ;id>make-assembly</id> <phase>package</phase> <goal                    S> <goal>single</goal> </goals>    </execution>           </executions> </plugin> </plugins></build> 


I wrote the Map-reduce code

Package Org.linger.mahout.mapreducer;import Java.io.ioexception;import Java.util.stringtokenizer;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapred.fileinputformat;import Org.apache.hadoop.mapred.fileoutputformat;import Org.apache.hadoop.mapred.jobclient;import Org.apache.hadoop.mapred.jobconf;import Org.apache.hadoop.mapred.mapreducebase;import Org.apache.hadoop.mapred.mapper;import Org.apache.hadoop.mapred.outputcollector;import Org.apache.hadoop.mapred.reporter;import Org.apache.hadoop.mapred.textinputformat;import Org.apache.hadoop.mapred.textoutputformat;import Org.apache.hadoop.util.genericoptionsparser;import Org.apache.mahout.cf.taste.impl.model.MemoryIDMigrator; public class Uservideoformat {public static class Map extends Mapreducebase implements Mapper<longwritable, text, text       , text> {private text userId = new text ();       Private text lefts = new text (); Private MemorYidmigrator Thing2long = new Memoryidmigrator (); public void Map (longwritable key, Text value, Outputcollector<text, text> output, Reporter Reporter) throws Ioexcept       Ion {String line = value.tostring ();       int spliter = Line.indexof (', ');       String userstr = line.substring (0, Spliter);               String leftsstr = line.substring (spliter+1);        Userid.set (Long.tostring (Thing2long.tolongid (USERSTR)));      Lefts.set (LEFTSSTR);       Output.collect (UserId, lefts); }} public static void Main (string[] args) throws IOException {//TODO auto-generated method stubjobconf conf = new Jobcon    f (uservideoformat.class); Conf.setjobname ("Uservideoformat");    Conf.setoutputkeyclass (Text.class); Conf.setoutputvalueclass (Text.class);    Conf.setmapperclass (Map.class);    Conf.set ("Mapred.textoutputformat.separator", ","); Conf.setinputformat (Textinputformat.class); Conf.setoutputformat (Textoutputformat.class); string[] Otherargs = new Genericoptionsparser (conF, args). Getremainingargs (); Fileinputformat.setinputpaths (conf, new Path (Otherargs[0]));        Fileoutputformat.setoutputpath (conf, new Path (Otherargs[1])); Jobclient.runjob (conf);}}


Perform MVN package packaging

Automatic generation of Mahoutproject-1.0-jar-with-dependencies.jar in target directory

Hadoop jar Mahoutproject-1.0-jar-with-dependencies.jarinput Output

Note that because the main function of the jar package is indicated in the Pom.xml configuration, it is not necessary to specify the main function here.

Otherwise, the main function is typically indicated behind the jar package.

Resources:

with Maven Build Mahout Project

http://blog.fens.me/hadoop-mahout-maven-eclipse/

Hadoop Job using third-party dependencies Jar file

Http://shiyanjun.cn/archives/373.html

Mahout UID when making recommendations , PID as a string type

http://blog.csdn.net/pan12jian/article/details/38703569



This article link: http://blog.csdn.net/lingerlanlan/article/details/42086623

This article linger


MAVEN Package Hadoop project (with third-party jar)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.