Compile, package, run wordcount--without eclipse using the command line

Source: Internet
Author: User

1) First create the WordCount1023 folder, and then use the editor in this directory, such as vim to write wordcount source files, and save as Wordcount.java file

1 /**2 * Licensed under the Apache License, Version 2.0 (the "License");3 * You are not a use this file except in compliance with the License.4 * Obtain a copy of the License at5  *6  *      http://www.apache.org/licenses/LICENSE-2.07  *8 * unless required by applicable or agreed to writing, software9 * Distributed under the License is distributed on a "as is" BASIS,Ten * without warranties or CONDITIONS of any KIND, either express or implied. One * See the License for the specific language governing permissions and A * Limitations under the License. -  */ -  the  - Importjava.io.IOException; - ImportJava.util.StringTokenizer; -  + Importorg.apache.hadoop.conf.Configuration; - ImportOrg.apache.hadoop.fs.Path; + Importorg.apache.hadoop.io.IntWritable; A ImportOrg.apache.hadoop.io.Text; at ImportOrg.apache.hadoop.fs.FileSystem; - Importorg.apache.hadoop.mapred.JobConf; - ImportOrg.apache.hadoop.mapreduce.Job; - ImportOrg.apache.hadoop.mapreduce.Mapper; - ImportOrg.apache.hadoop.mapreduce.Reducer; - ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat; in ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat; - ImportOrg.apache.hadoop.util.GenericOptionsParser; to  +  Public classWordCount { -  the    Public Static classTokenizermapper *        extendsMapper<object, text, text, intwritable>{ $     Panax Notoginseng     Private Final StaticIntwritable one =NewIntwritable (1); -     PrivateText Word =NewText (); the        +      Public voidmap (Object key, Text value, context context A)throwsIOException, interruptedexception { theStringTokenizer ITR =NewStringTokenizer (value.tostring ()); +        while(Itr.hasmoretokens ()) { - Word.set (Itr.nexttoken ()); $ Context.write (Word, one); $       } -     } -   } the    -    Public Static classIntsumreducerWuyi        extendsReducer<text,intwritable,text,intwritable> { the     Privateintwritable result =Newintwritable (); -  Wu      Public voidReduce (Text key, iterable<intwritable>values, - Context Context About)throwsIOException, interruptedexception { $       intsum = 0; -        for(intwritable val:values) { -Sum + =val.get (); -       } A result.set (sum); + Context.write (key, result); the     } -   } $  the    Public Static voidMain (string[] args)throwsException { theConfiguration conf =NewConfiguration (); the     //jobconf conf=new jobconf (); the     // -     //Conf.setjar ("Org.apache.hadoop.examples.WordCount.jar"); in    //conf.set ("Fs.default.name", "HDFs://master:9000/");  the     //conf.set ("Hadoop.job.user", "Hadoop");  the     //Specify IP and port number of Jobtracker, master can be configured in/etc/hosts About    //conf.set ("Mapred.job.tracker", "master:9001"); thestring[] Otherargs =Newgenericoptionsparser (conf, args). Getremainingargs (); the     if(Otherargs.length! = 2) { theSystem.err.println ("Usage:wordcount <in> <out>"); +System.exit (2); -     } the BayiFileSystem HDFs =filesystem.get (conf); thePath findf=NewPath (otherargs[1]); the     Booleanisexists=hdfs.exists (FINDF); -System.out.println ("Exit?") +isexists); -     if(isexists) the     { theHdfs.delete (FINDF,true); theSystem.out.println ("Delete Output"); the          -     } the    the  theJob Job =NewJob (conf, "word count");94      theJob.setjarbyclass (WordCount.class); theJob.setmapperclass (Tokenizermapper.class); theJob.setcombinerclass (Intsumreducer.class);98Job.setreducerclass (Intsumreducer.class); AboutJob.setoutputkeyclass (Text.class); -Job.setoutputvalueclass (intwritable.class);101Fileinputformat.addinputpath (Job,NewPath (otherargs[0]));102Fileoutputformat.setoutputpath (Job,NewPath (otherargs[1]));103System.exit (Job.waitforcompletion (true) ? 0:1);104   } the}

2) Then compile the Java source file in the WordCount1023 directory using Javac.

Use Classpath to add the two jar packages of Hadoop that the source program compiles, and then the file name of the source program to be compiled.

Three class files are generated after successful compilation:

A jar file is a compressed file that compresses a number of Java class files into a jar file, which simply compresses the Wordcount.class file into a jar file.

Then commit this jar package to the Hadoop cluster and run an error:

Bug tip: Discover defined classes every day: that is, the inner class tokenizermapper of WordCount. Because I did not put this class in the jar bag AH ~ ~

To re-play the jar package:

Use *.class to make all of the. Class suffixes into a jar package (in fact, the three class files).

You can see the class file into the jar package by indicating the manifest (manifest).

Running again is successful:

3)

Hadoop program when the output file will be error, so the program in the internal detection output file exists, if existing, delete. There are three lines of code that need to be explained in detail.

string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();

Reading parameters from the command line is the command line used by the Hadoop submission job. Args reads the data at the end of the command line, remembers the input path and the output path where the result is stored, and then stores it in the string array Otherargs.

New Path (Otherargs[0]));

Otherargs[0] is the character swap that represents the input path to the dataset.

New Path (otherargs[1]));

Otherargs[1] is a string representing the output path of the result.

Lenovo to configure the input and output path in an item in the configuration parameter in eclipse it's clear why Eclipse can run a Hadoop program without using the command line.

Compile, package, run wordcount--without eclipse using the command line

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.