Word Count Wordcountapp.class

Source: Internet
Author: User
Tags map class shuffle

 Public classWordcountapp {//The directory can be specified, if there is a level two directory, it will not be executed, only the first level of the directory will be executed.    Private Static FinalString Input_path = "HDFS://HADOOP1:9000/ABD";//Input Path    Private Static FinalString Out_path = "Hdfs://hadoop1:9000/out";//output path, the result of the reduce job output is a directory//_success: In Linux, these files that are underlined are generally ignored. Indicates that the job execution was successful. //_logs: The resulting log file. //part-r-00000: Generated is the file of our output. Start with part. The result of the r:reduce output, the result of the map output is that the m,00000 is the ordinal     Public Static voidMain (string[] args) {Configuration conf=NewConfiguration ();//Configuration Objects        Try{FileSystem FileSystem= Filesystem.get (NewURI (Out_path), conf); Filesystem.delete (NewPath (Out_path),true); Job Job=NewJob (conf, Wordcountapp.class. Getsimplename ());//jobName: Job nameJob.setjarbyclass (Wordcountapp.class); Fileinputformat.setinputpaths (Job, input_path);//specifying input for dataJob.setmapperclass (Mymapper.class);//Specifying a custom map classJob.setmapoutputkeyclass (Text.class);//Specify the type of the map output keyJob.setmapoutputvalueclass (longwritable.class);//Specifies the type of map output valueJob.setreducerclass (Myreducer.class);//Specify a custom reduce classJob.setoutputkeyclass (Text.class);//set the type of the reduce output keyJob.setoutputvalueclass (longwritable.class);//set the value type of the reduce outputFileoutputformat.setoutputpath (Job,NewPath (Out_path));//When the reduce output is finished, a final output is generated, specifying the location of the final outputJob.waitforcompletion (true);//submit to Jobtracker and wait for the end}Catch(Exception e) {e.printstacktrace (); }    }    /*** The input key indicates the offset: the byte at which the line begins. Value entered: The contents of the current text. MapReduce execution Process: * Here at the side, our data input comes from the original file, the data output is written out to HDFs, the middle of the heap is the map output generated by the temporary results. Stored on a map running Linux disk, * When shuffle is passed, reduce will pass     HTTP takes the corresponding data from the map side. * Mapred-default.xml Mapredcue.jobtracker *. Root.dir,mapred.tmp.dir stores the results of map generation.     This directory is generated when the job is run, and it deletes the directory after the job has finished running. */     Public Static classMymapperextendsMapper<longwritable, text, text, longwritable> {        //The source file has two lines of records, parsing the source file produces two key-value pairs. <0,hello You>,<10,hello Me>, respectively, so the map function is called two times. //when the computer is stored, it is a one-dimensional structure.@Overrideprotected voidmap (longwritable key, Text value, context context)throwsIOException, interruptedexception {//Why convert a Hadoop type to a Java type?String line =value.tostring (); String[] splited= Line.split ("\ t"); //the advantage of using HashMap to write out: Reduce the number of key-value pairs that appear.map<string, integer> HashMap =NewHashmap<string, integer>();  for(String word:splited) {//in the For loop body, the temporary variable word appears with a constant of 1 at this timeContext.write (NewText (Word),NewLongwritable (1));//write out the number of occurrences of each word 1.            }        }    }    //after the map function execution, the map output of <k,v> total 4 .//once the map has finished processing the data, it will enter reduce. //before entering shuffle, the data needs to be partitioned. The default is only one zone. //sorts and groups the data in each of the different partitions. //sorted Results://The result after grouping (the value of the same key is placed in a collection)://statute (Optional)//The process of distributing data in map to reduce is called shuffle     Public Static classMyreducerextendsReducer<text, Longwritable, Text, longwritable> {        //Each group calls the reduce function, which is called three times@Overrideprotected voidReduce (Text key, iterable<longwritable>values, context context)throwsIOException, interruptedexception {//count indicates the number of occurrences of the word key in the entire file//The number of groupings is equal to the reduce function call count. //The number of reduce function calls and the amount of <k,v> generated leave the business without any relationship!            LongCount = 0L;  for(longwritable times:values) {count+=Times.get (); } context.write (Key,Newlongwritable (count)); }    }}

Word Count Wordcountapp.class

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.