Sorting of Hadoop two columns of data

Source: Internet
Author: User

Original data form

1 2
2 4
2 3
2 1
3 1
3 4
4 1
4
4 3
1 1


Sort by the first column. If the first column is equal, sort by the second column.

If you use the automatic sorting of mapreduce process, you can only sort by the first column. Now you need to customize a class that inherits from the WritableComparable interface and use this class as the key, you can use the automatic sorting of mapreduce process. The Code is as follows:

Package mapReduce;


Import java. io. DataInput;
Import java. io. DataOutput;
Import java. io. IOException;
Import java.net. URI;
Import java.net. URISyntaxException;


Import mapReduce. SortApp1.MyMapper;
Import mapReduce. sortapp1.mycer CER;
Import mapReduce. SortApp1.NewK2;


Import org. apache. Hadoop. conf. Configuration;
Import org. apache. hadoop. fs. FileSystem;
Import org. apache. hadoop. fs. Path;
Import org. apache. hadoop. io. LongWritable;
Import org. apache. hadoop. io. Text;
Import org. apache. hadoop. io. WritableComparable;
Import org. apache. hadoop. mapreduce. Job;
Import org. apache. hadoop. mapreduce. Mapper;
Import org. apache. hadoop. mapreduce. Cer CER;
Import org. apache. hadoop. mapreduce. lib. input. FileInputFormat;
Import org. apache. hadoop. mapreduce. lib. input. TextInputFormat;
Import org. apache. hadoop. mapreduce. lib. output. FileOutputFormat;
Import org. apache. hadoop. mapreduce. lib. output. TextOutputFormat;


Public class SortApp {
Private static String INPUT_PATH = "hdfs: // hadoop1: 9000/data ";
Private static String OUT_PATH = "hdfs: // hadoops 1: 9000/dataOut ";


Public static void main (String [] args) throws Exception {
Configuration conf = new Configuration ();
Final FileSystem fileSystem = FileSystem. get (new URI (INPUT_PATH), conf );
Path outputDir = new Path (OUT_PATH );
If (fileSystem. exists (outputDir )){
FileSystem. delete (outputDir, true );
}


Job job = new Job (conf, "data ");


FileInputFormat. setInputPaths (job, INPUT_PATH );


Job. setInputFormatClass (TextInputFormat. class );

Job. setMapOutputKeyClass (KeyValue. class );
Job. setMapOutputValueClass (LongWritable. class );

Job. setMapperClass (MyMapper. class );
Job. setReducerClass (mycer Cer. class );

Job. setOutputKeyClass (LongWritable. class );
Job. setOutputValueClass (LongWritable. class );

FileOutputFormat. setOutputPath (job, new Path (OUT_PATH ));
Job. waitForCompletion (true );

 

 

}


Static class MyMapper extends
Mapper <LongWritable, Text, KeyValue, LongWritable> {
@ Override
Protected void map (LongWritable k1, Text value, Context context)
Throws IOException, InterruptedException {
Final String [] splited = value. toString (). split ("\ t ");
Final KeyValue k2 = new KeyValue (Long. parseLong (splited [0]), Long. parseLong (splited [1]);
Final LongWritable v2 = new LongWritable (Long. parseLong (splited [1]);
Context. write (k2, v2 );
}
}


Static class MyReducer extends Reducer <KeyValue, LongWritable> {
Protected void reduce (KeyValue k2, java. lang. Iterable <LongWritable> v2s, Context context) throws IOException, InterruptedException {
Context. write (new LongWritable (k2.first), new LongWritable (k2.second ));
}
}
Static class KeyValue implements WritableComparable <KeyValue> {
Long first;
Long second;

Public KeyValue (){}

Public KeyValue (long first, long second ){
This. first = first;
This. second = second;
}

 

@ Override
Public void readFields (DataInput in) throws IOException {
This. first = in. readLong ();
This. second = in. readLong ();
}


@ Override
Public void write (DataOutput out) throws IOException {
Out. writeLong (first );
Out. writeLong (second );
}


@ Override
Public int compareTo (KeyValue o ){
Final long minus = this. first-o. first;
If (minus! = 0 ){
Return (int) minus;
}
Return (int) (this. second-o. second );
}

Public int hashCode (){
Return this. first. hashCode () + this. second. hashCode ();
}

@ Override
Public boolean equals (Object obj ){
If (! (Obj instanceof KeyValue )){
Return false;
}
KeyValue kv = (KeyValue) obj;
Return (this. first = kv. first) & (this. second = kv. second );
}

Public boolean equals (Object obj ){
If (! (Obj instanceof NewK2 )){
Return false;
}
NewK2 oK2 = (NewK2) obj;
Return (this. first = oK2.first) & (this. second = oK2.second );
}
}
}

The first second attribute in KeyValue must be written as the Long type instead of the long type. Otherwise, this. first. hashCode () is not valid. All classes that implement WritableComparable can be sorted, which can be complex data. You only need to encapsulate them into classes that implement WritableComparable as keys.

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.