MAVEN Package Hadoop project (with third-party jar)
Issue background:
1 Write the Map-reduce program, use the third-party jar, how to package and submit the project to the server execution.
2 mahout in the itembased algorithm, the UID is mapped from string to long.
The specific features I've implemented here are:
The data format for the Mahout itembased algorithm is: Uid,vid,score. Where UID and vid must be numeric (long), score is a decimal integer can be.
However, I have a field uid,vid,score for each row of records here,
The UID contains letters. So I have to map the UID from string to long.
With the speed in mind, a distributed program is used to do this conversion.
In addition, a direct call to the mahout inside a class
Org.apache.mahout.cf.taste.impl.model.MemoryIDMigrator
Create a standardized Java project with Maven
MVN archetype:generate-darchetypegroupid=org.apache.maven.archetypes-dgroupid=org.linger.mahout-dartifactid= Mahoutproject-dpackagename=org.linger.mahout-dversion=1.0-dinteractivemode=false
Perform the MVN clean install initialization project, and note that a Pom.xml file is automatically generated.
Modify Pom.xml,
1 Get rid of JUnit first.
2 Add mahout dependency jar in Pom.xml (here don't study mahout how these jar dependencies come out)
<properties> <PROJECT.BUILD.SOURCEENCODING>UTF-8</PROJECT.BUILD.SOURCEENCODING&G T <mahout.version>0.8</mahout.version> </properties> <dependencies> <dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-core</artifactId> <version>${mahout.version}</versio n> </dependency> <dependency> <groupid> ;org.apache.mahout</groupid> <artifactId>mahout-integration</artifactId> <version>${mahout.version}</version> <exclusions> <exclusion> <groupid>org.mortbay . JeTty</groupid> <artifactId>jetty</artifactId> </exclusion> <exclusion> <groupId>org.apache.cassandra</groupId> < Artifactid>cassandra-all</artifactid> </exclusion> <exclusion> <groupid>me.prettyprint</groupid& Gt <artifactId>hector-core</artifactId> </exclusion> </exclusions> </dependency> </dependencies>
3 Configuring jar Packaging options in pom.xml
<build> <plugins> <plugin> <artifactid>maven-assembly-plugin</artifacti d> <configuration> <archive> <manifest> <mainClass>org.linger.mahout.mapreducer.UserVideoFormat</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> < ;id>make-assembly</id> <phase>package</phase> <goal S> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins></build>
I wrote the Map-reduce code
Package Org.linger.mahout.mapreducer;import Java.io.ioexception;import Java.util.stringtokenizer;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapred.fileinputformat;import Org.apache.hadoop.mapred.fileoutputformat;import Org.apache.hadoop.mapred.jobclient;import Org.apache.hadoop.mapred.jobconf;import Org.apache.hadoop.mapred.mapreducebase;import Org.apache.hadoop.mapred.mapper;import Org.apache.hadoop.mapred.outputcollector;import Org.apache.hadoop.mapred.reporter;import Org.apache.hadoop.mapred.textinputformat;import Org.apache.hadoop.mapred.textoutputformat;import Org.apache.hadoop.util.genericoptionsparser;import Org.apache.mahout.cf.taste.impl.model.MemoryIDMigrator; public class Uservideoformat {public static class Map extends Mapreducebase implements Mapper<longwritable, text, text , text> {private text userId = new text (); Private text lefts = new text (); Private MemorYidmigrator Thing2long = new Memoryidmigrator (); public void Map (longwritable key, Text value, Outputcollector<text, text> output, Reporter Reporter) throws Ioexcept Ion {String line = value.tostring (); int spliter = Line.indexof (', '); String userstr = line.substring (0, Spliter); String leftsstr = line.substring (spliter+1); Userid.set (Long.tostring (Thing2long.tolongid (USERSTR))); Lefts.set (LEFTSSTR); Output.collect (UserId, lefts); }} public static void Main (string[] args) throws IOException {//TODO auto-generated method stubjobconf conf = new Jobcon f (uservideoformat.class); Conf.setjobname ("Uservideoformat"); Conf.setoutputkeyclass (Text.class); Conf.setoutputvalueclass (Text.class); Conf.setmapperclass (Map.class); Conf.set ("Mapred.textoutputformat.separator", ","); Conf.setinputformat (Textinputformat.class); Conf.setoutputformat (Textoutputformat.class); string[] Otherargs = new Genericoptionsparser (conF, args). Getremainingargs (); Fileinputformat.setinputpaths (conf, new Path (Otherargs[0])); Fileoutputformat.setoutputpath (conf, new Path (Otherargs[1])); Jobclient.runjob (conf);}}
Perform MVN package packaging
Automatic generation of Mahoutproject-1.0-jar-with-dependencies.jar in target directory
Hadoop jar Mahoutproject-1.0-jar-with-dependencies.jarinput Output
Note that because the main function of the jar package is indicated in the Pom.xml configuration, it is not necessary to specify the main function here.
Otherwise, the main function is typically indicated behind the jar package.
Resources:
with Maven Build Mahout Project
http://blog.fens.me/hadoop-mahout-maven-eclipse/
Hadoop Job using third-party dependencies Jar file
Http://shiyanjun.cn/archives/373.html
Mahout UID when making recommendations , PID as a string type
http://blog.csdn.net/pan12jian/article/details/38703569
This article link: http://blog.csdn.net/lingerlanlan/article/details/42086623
This article linger
MAVEN Package Hadoop project (with third-party jar)