Sorting of Hadoop two columns of data

Last Update:2014-05-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original data form

1 2
2 4
2 3
2 1
3 1
3 4
4 1
4
4 3
1 1

Sort by the first column. If the first column is equal, sort by the second column.

If you use the automatic sorting of mapreduce process, you can only sort by the first column. Now you need to customize a class that inherits from the WritableComparable interface and use this class as the key, you can use the automatic sorting of mapreduce process. The Code is as follows:

Package mapReduce;

Import java. io. DataInput;
Import java. io. DataOutput;
Import java. io. IOException;
Import java.net. URI;
Import java.net. URISyntaxException;

Import mapReduce. SortApp1.MyMapper;
Import mapReduce. sortapp1.mycer CER;
Import mapReduce. SortApp1.NewK2;

Import org. apache. Hadoop. conf. Configuration;
Import org. apache. hadoop. fs. FileSystem;
Import org. apache. hadoop. fs. Path;
Import org. apache. hadoop. io. LongWritable;
Import org. apache. hadoop. io. Text;
Import org. apache. hadoop. io. WritableComparable;
Import org. apache. hadoop. mapreduce. Job;
Import org. apache. hadoop. mapreduce. Mapper;
Import org. apache. hadoop. mapreduce. Cer CER;
Import org. apache. hadoop. mapreduce. lib. input. FileInputFormat;
Import org. apache. hadoop. mapreduce. lib. input. TextInputFormat;
Import org. apache. hadoop. mapreduce. lib. output. FileOutputFormat;
Import org. apache. hadoop. mapreduce. lib. output. TextOutputFormat;

Public class SortApp {
Private static String INPUT_PATH = "hdfs: // hadoop1: 9000/data ";
Private static String OUT_PATH = "hdfs: // hadoops 1: 9000/dataOut ";

Public static void main (String [] args) throws Exception {
Configuration conf = new Configuration ();
Final FileSystem fileSystem = FileSystem. get (new URI (INPUT_PATH), conf );
Path outputDir = new Path (OUT_PATH );
If (fileSystem. exists (outputDir )){
FileSystem. delete (outputDir, true );
}

Job job = new Job (conf, "data ");

FileInputFormat. setInputPaths (job, INPUT_PATH );

Job. setInputFormatClass (TextInputFormat. class );

Job. setMapOutputKeyClass (KeyValue. class );
Job. setMapOutputValueClass (LongWritable. class );

Job. setMapperClass (MyMapper. class );
Job. setReducerClass (mycer Cer. class );

Job. setOutputKeyClass (LongWritable. class );
Job. setOutputValueClass (LongWritable. class );

FileOutputFormat. setOutputPath (job, new Path (OUT_PATH ));
Job. waitForCompletion (true );

}

Static class MyMapper extends
Mapper <LongWritable, Text, KeyValue, LongWritable> {
@ Override
Protected void map (LongWritable k1, Text value, Context context)
Throws IOException, InterruptedException {
Final String [] splited = value. toString (). split ("\ t ");
Final KeyValue k2 = new KeyValue (Long. parseLong (splited [0]), Long. parseLong (splited [1]);
Final LongWritable v2 = new LongWritable (Long. parseLong (splited [1]);
Context. write (k2, v2 );
}
}

Static class MyReducer extends Reducer <KeyValue, LongWritable> {
Protected void reduce (KeyValue k2, java. lang. Iterable <LongWritable> v2s, Context context) throws IOException, InterruptedException {
Context. write (new LongWritable (k2.first), new LongWritable (k2.second ));
}
}
Static class KeyValue implements WritableComparable <KeyValue> {
Long first;
Long second;

Public KeyValue (){}

Public KeyValue (long first, long second ){
This. first = first;
This. second = second;
}

@ Override
Public void readFields (DataInput in) throws IOException {
This. first = in. readLong ();
This. second = in. readLong ();
}

@ Override
Public void write (DataOutput out) throws IOException {
Out. writeLong (first );
Out. writeLong (second );
}

@ Override
Public int compareTo (KeyValue o ){
Final long minus = this. first-o. first;
If (minus! = 0 ){
Return (int) minus;
}
Return (int) (this. second-o. second );
}

Public int hashCode (){
Return this. first. hashCode () + this. second. hashCode ();
}

@ Override
Public boolean equals (Object obj ){
If (! (Obj instanceof KeyValue )){
Return false;
}
KeyValue kv = (KeyValue) obj;
Return (this. first = kv. first) & (this. second = kv. second );
}

Public boolean equals (Object obj ){
If (! (Obj instanceof NewK2 )){
Return false;
}
NewK2 oK2 = (NewK2) obj;
Return (this. first = oK2.first) & (this. second = oK2.second );
}
}
}

The first second attribute in KeyValue must be written as the Long type instead of the long type. Otherwise, this. first. hashCode () is not valid. All classes that implement WritableComparable can be sorted, which can be complex data. You only need to encapsulate them into classes that implement WritableComparable as keys.

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Sorting of Hadoop two columns of data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Sorting of Hadoop two columns of data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support