12-way Mr Exercises –1– sorting

Source: Internet
Author: User

Topic:

A file that is approximately 100G in size. Each line of the file is a number that requires sorting all the numbers in the file.

For this topic, the students who have learned about Hadoop can laugh without words. Even using spark is a very simple thing to accomplish.

Let's start with Hadoop. In fact, there's nothing to say: The map task reads the numbers row by line and then outputs it in reduce, which is simply outrageous.

Look at the code, OK:

 PackageCom.zhyea.dev;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;Importjava.io.IOException; Public classNumbersort { Public Static classSplittermapperextendsMapper<object, Text, intwritable, intwritable> {        Private Static FinalIntwritable intwritable =Newintwritable (); @Override Public voidmap (Object key, Text value, context context) {Try {                intnum =integer.valueof (value.tostring ());                Intwritable.set (num);            Context.write (intwritable, intwritable); } Catch(Exception e) {e.printstacktrace (); }        }    }     Public Static classIntegratereducerextendsReducer<intwritable, Intwritable, intwritable, intwritable>{@Override Public voidReduce (intwritable key, iterable<intwritable>values, Context context) {            Try{context.write (key, key); } Catch(Exception e) {e.printstacktrace (); }        }    }     Public Static voidMain (string[] args)throwsIOException, ClassNotFoundException, interruptedexception {Configuration conf=NewConfiguration (); Job Job= Job.getinstance (conf, "Number-sort"); Job.setjarbyclass (Numbersort.class); Job.setmapperclass (splittermapper.class); Job.setreducerclass (integratereducer.class); Job.setoutputkeyclass (intwritable.class); Job.setoutputvalueclass (intwritable.class); Fileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); System.exit (Job.waitforcompletion (true) ? 0:1); }}

In the map method, the value portion of the output values I selected a value of intwritable. The types of value values can also be set to nullwritable, but this makes the map task slow to execute, although the reduce task executes faster, but ultimately it is not worth the candle.

There is no sort of action in our program, but the result of the output is orderly, because in the shuffle phase the sorting has been completed (one quick sort, one merge sort).

Take a look at how Spark is done:

object Numsortjob {  = {    = args (0)    = args (1)    New Sparkconf (). Setappname ("Num Sort")    new  sparkcontext (conf)    =  Sc.hadoopfile[longwritable, Text, Textinputformat] (inputpath)    true). Saveastextfile ( OutputPath)  }}

Spark needs to be actively sequenced. Even if you choose to use Sortbasedshuffle, its sorting only ends at the mapper end of the sort, and the result set is not necessarily ordered.

#########

12-way Mr Exercises –1– sorting

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.