Use MultipleOutputs in MapReduce to output multiple files

Source: Internet
Author: User

Use MultipleOutputs in MapReduce to output multiple files

When you use Mapreduce, the part-* name is used by default. MultipleOutputs can output different key-value pairs to different custom files.

The implementation process is to call output. write (key, new IntWritable (total), key. toString ());

The third parameter is public void write (KEYOUT key, VALUEOUT value, String baseOutputPath), which specifies the name prefix of the output file, then we can use different baseOutputPath for different keys to output the values of different keys to different files, for example, output data from the same day to a file named after this date.

Spark subverts the sorting records maintained by MapReduce

Implement MapReduce in Oracle Database

MapReduce implements matrix multiplication-implementation code

MapReduce-based graph algorithm PDF

Hadoop HDFS and MapReduce

MapReduce counters

Hadoop technology Insider: in-depth analysis of MapReduce Architecture Design and Implementation Principles PDF

Test data: ip-to-hosts.txt

18.217.167.70 United States
206.96.54.107 United States
196.109.151.139 Mauritius
174.52.58.113 United States
142.111.216.8 Canada
162.100.49.185 United States
146.38.26.54 United States
36.35.107.36 China
95.214.95.13 Spain
2.96.191.111 United Kingdom
62.177.119.177 Czech Republic
21.165.189.3 United States
46.190.32.115 Greece
113.173.113.29 Vietnam
42.65.172.142 Taiwan
197.91.198.199 South Africa
68.165.71.27 United States
110.119.165.104 China
171.50.76.89 India
171.207.52.113 Singapore
40.174.30.170 United States
191.170.95.175 United States
17.81.129.101 United States
91.212.157.202 France
173.83.82.99 United States
129.75.56.220 United States
149.25.104.198 United States
103.110.22.19 Indonesia
204.188.117.122 United States
138.23.10.72 United States
172.50.15.32 United States
85.88.38.58 Belgium
49.15.14.6 India
19.84.175.5 United States
50.158.140.215 United States
161.114.120.34 United States
118.211.174.52 Australia
220.98.113.71 Japan
182.101.16.171 China
25.45.75.194 United Kingdom
168.16.162.99 United States
155.60.219.154 Australia
26.216.17.198 United States
68.34.157.157 United States
89.176.196.28 Czech Republic
173.11.51.134 United States
116.207.191.159 China
164.210.124.152 United States
168.17.158.38 United States
174.24.173.11 United States
143.64.173.176 United States
160.164.158.125 Italy
15.111.128.4 United States
22.71.176.163 United States
105.57.100.182 Morocco
111.147.83.42 China
137.157.65.89 Australia

Each line of data in this file has two fields: the IP address and the corresponding country of the IP address, separated by \ t

Code on

Public static class IPCountryReducer
Extends Reducer <Text, IntWritable, Text, IntWritable> {

Private MultipleOutputs output;

@ Override
Protected void setup (Context context
) Throws IOException, InterruptedException {
Output = new MultipleOutputs (context );
}


@ Override
Protected void reduce (Text key, Iterable <IntWritable> values, Context context
) Throws IOException, InterruptedException {
Int total = 0;
For (IntWritable value: values ){
Total + = value. get ();
}
<Span style = "color: # FF0000;"> output. write (new Text ("Output by MultipleOutputs"), NullWritable. get (), key. toString ());
Output. write (key, new IntWritable (total), key. toString (); </span>

}

@ Override
Protected void cleanup (Context context
) Throws IOException, InterruptedException {
Output. close ();
}
}

In the setup method of reduce

Output = new MultipleOutputs (context );

Then, in reduce, the output is used to output the content to different files.

Private Configuration conf;
Public static final String NAME = "named_output ";


Public static void main (String [] args) throws Exception {
Args = new String [] {"hdfs: // caozw: 9100/user/hadoop/hadooprealword", "hdfs: // caozw: 9100/user/hadoop/hadooprealword/output "};
ToolRunner. run (new Configuration (), new NamedCountryOutputJob (), args );
}

Public int run (String [] args) throws Exception {
If (args. length! = 2 ){
System. err. println ("Usage: named_output <input> <output> ");
System. exit (1 );
}

Job job = new Job (conf, "IP count by country to named files ");
Job. setInputFormatClass (TextInputFormat. class );

Job. setMapperClass (IPCountryMapper. class );
Job. setReducerClass (IPCountryReducer. class );

Job. setMapOutputKeyClass (Text. class );
Job. setMapOutputValueClass (IntWritable. class );
Job. setJarByClass (NamedCountryOutputJob. class );

FileInputFormat. addInputPath (job, new Path (args [0]);
FileOutputFormat. setOutputPath (job, new Path (args [1]);

Return job. waitForCompletion (true )? 1: 0;

}

Public void setConf (Configuration conf ){
This. conf = conf;
}

Public Configuration getConf (){
Return conf;
}

Public static class IPCountryMapper
Extends Mapper <LongWritable, Text, Text, IntWritable> {

Private static final int country_pos = 1;
Private static final Pattern pattern = Pattern. compile ("\ t ");

@ Override
Protected void map (LongWritable key, Text value,
Context context) throws IOException, InterruptedException {
String country = pattern. split (value. toString () [country_pos];
Context. write (new Text (country), new IntWritable (1 ));
}
}

Test results:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.