Word Count Wordcountapp.class

Last Update:2015-04-21 Source: Internet

Author: User

Tags map class shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

 Public classWordcountapp {//The directory can be specified, if there is a level two directory, it will not be executed, only the first level of the directory will be executed.    Private Static FinalString Input_path = "HDFS://HADOOP1:9000/ABD";//Input Path    Private Static FinalString Out_path = "Hdfs://hadoop1:9000/out";//output path, the result of the reduce job output is a directory//_success: In Linux, these files that are underlined are generally ignored. Indicates that the job execution was successful. //_logs: The resulting log file. //part-r-00000: Generated is the file of our output. Start with part. The result of the r:reduce output, the result of the map output is that the m,00000 is the ordinal     Public Static voidMain (string[] args) {Configuration conf=NewConfiguration ();//Configuration Objects        Try{FileSystem FileSystem= Filesystem.get (NewURI (Out_path), conf); Filesystem.delete (NewPath (Out_path),true); Job Job=NewJob (conf, Wordcountapp.class. Getsimplename ());//jobName: Job nameJob.setjarbyclass (Wordcountapp.class); Fileinputformat.setinputpaths (Job, input_path);//specifying input for dataJob.setmapperclass (Mymapper.class);//Specifying a custom map classJob.setmapoutputkeyclass (Text.class);//Specify the type of the map output keyJob.setmapoutputvalueclass (longwritable.class);//Specifies the type of map output valueJob.setreducerclass (Myreducer.class);//Specify a custom reduce classJob.setoutputkeyclass (Text.class);//set the type of the reduce output keyJob.setoutputvalueclass (longwritable.class);//set the value type of the reduce outputFileoutputformat.setoutputpath (Job,NewPath (Out_path));//When the reduce output is finished, a final output is generated, specifying the location of the final outputJob.waitforcompletion (true);//submit to Jobtracker and wait for the end}Catch(Exception e) {e.printstacktrace (); }    }    /*** The input key indicates the offset: the byte at which the line begins. Value entered: The contents of the current text. MapReduce execution Process: * Here at the side, our data input comes from the original file, the data output is written out to HDFs, the middle of the heap is the map output generated by the temporary results. Stored on a map running Linux disk, * When shuffle is passed, reduce will pass     HTTP takes the corresponding data from the map side. * Mapred-default.xml Mapredcue.jobtracker *. Root.dir,mapred.tmp.dir stores the results of map generation.     This directory is generated when the job is run, and it deletes the directory after the job has finished running. */     Public Static classMymapperextendsMapper<longwritable, text, text, longwritable> {        //The source file has two lines of records, parsing the source file produces two key-value pairs. <0,hello You>,<10,hello Me>, respectively, so the map function is called two times. //when the computer is stored, it is a one-dimensional structure.@Overrideprotected voidmap (longwritable key, Text value, context context)throwsIOException, interruptedexception {//Why convert a Hadoop type to a Java type?String line =value.tostring (); String[] splited= Line.split ("\ t"); //the advantage of using HashMap to write out: Reduce the number of key-value pairs that appear.map<string, integer> HashMap =NewHashmap<string, integer>();  for(String word:splited) {//in the For loop body, the temporary variable word appears with a constant of 1 at this timeContext.write (NewText (Word),NewLongwritable (1));//write out the number of occurrences of each word 1.            }        }    }    //after the map function execution, the map output of <k,v> total 4 .//once the map has finished processing the data, it will enter reduce. //before entering shuffle, the data needs to be partitioned. The default is only one zone. //sorts and groups the data in each of the different partitions. //sorted Results://The result after grouping (the value of the same key is placed in a collection)://statute (Optional)//The process of distributing data in map to reduce is called shuffle     Public Static classMyreducerextendsReducer<text, Longwritable, Text, longwritable> {        //Each group calls the reduce function, which is called three times@Overrideprotected voidReduce (Text key, iterable<longwritable>values, context context)throwsIOException, interruptedexception {//count indicates the number of occurrences of the word key in the entire file//The number of groupings is equal to the reduce function call count. //The number of reduce function calls and the amount of <k,v> generated leave the business without any relationship!            LongCount = 0L;  for(longwritable times:values) {count+=Times.get (); } context.write (Key,Newlongwritable (count)); }    }}

Word Count Wordcountapp.class

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More