MapReduce Programming Example (2)-Find the maximum and minimum value __ programming

Source: Internet
Author: User

In the data statistics of the website, there is a situation, that is, to count the number of comments made by a user, the time of the first comment and the time of last comment. The following code is the solution to the problem of Comments.xml. The code is as follows:

Package MRDP.CH2;
Import Java.io.DataInput;
Import Java.io.DataOutput;
Import java.io.IOException;
Import java.text.ParseException;
Import Java.text.SimpleDateFormat;
Import Java.util.Date;

Import Java.util.Map;

Import Mrdp.utils.MRDPUtils;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.io.Text;
Import org.apache.hadoop.io.Writable;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Import Org.apache.hadoop.util.GenericOptionsParser; public class Minmaxcountdriver {public static class Sominmaxcountmapper extends Mapper<object, text, text, Minmax
		counttuple> {//Our output key and value writables private Text Outuserid = new text ();

		Private Minmaxcounttuple outtuple = new Minmaxcounttuple (); This object WIll format the creation date string into a Date object private final static SimpleDateFormat frmt = new SimpleDateFormat ("Yyyy-mm-dd ' T ' HH:mm:ss.")

		SSS ");  @Override public void Map (Object key, Text value, Context context) throws IOException, interruptedexception {// Parse the input string into a nice map map<string, string> parsed = Mrdputils.transformxmltomap (Value.tostring ()

			); Grab the "creationdate" field since it is what we are finding//the Min and max value of String strdate = parsed

			. Get ("CreationDate");

			Grab the "UserID" since it is what we are grouping by String UserID = Parsed.get ("UserID"); . Get would return NULL if the key was not there if (strdate = null | | | userId = NULL) {//skip this record R
			Eturn;

				The try {//Parse the string into a Date object date CreationDate = Frmt.parse (strdate);
			Set the minimum and maximum date values to the CreationDate outtuple.setmin (creationdate);	Outtuple.setmax (CreationDate);

				Set the comment count to 1 outtuple.setcount (1);

				Set our user ID as the output key Outuserid.set (userId);
			The "Write out" user ID with min Max dates and Count Context.write (Outuserid, outtuple); 

	catch (ParseException e) {//An error occurred parsing the creation Date string//Skip public static class Sominmaxcountreducer extends Reducer<text, Minmaxcounttuple, Text, minmaxcounttuple> {p

		Rivate minmaxcounttuple result = new Minmaxcounttuple ();  @Override public void reduce (Text key, iterable<minmaxcounttuple> values, context) throws IOException,
			interruptedexception {//Initialize our result result.setmin (null);
			Result.setmax (NULL);

			int sum = 0; Iterate through all input values as this key for (Minmaxcounttuple val:values) {//If the value ' s min is le
			SS than the result's min//Set The result's min to value ' s	if (result.getmin () = null | | val.getmin (). CompareTo (Result.getmin ()) < 0) {result.setmin (Val.getmin ()); }//If the value ' s max is less than the "result" max//Set The result's Max to value ' s If (Result.getma X () = NULL | |
				Val.getmax (). CompareTo (Result.getmax ()) > 0) {result.setmax (Val.getmax ());
			//ADD to We sum the count for Val sum + + + val.getcount ();

			}//Set We count to the number of input values result.setcount (sum);
		Context.write (key, result);
		} public static void Main (string[] args) throws Exception {Configuration conf = new Configuration ();
		string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();
			if (otherargs.length!= 2) {System.err.println ("Usage:minmaxcountdriver <in> <out>");
		System.exit (2);
		Job Job = new Job (conf, "StackOverflow Comment Date Min Max Count");
		Job.setjarbyclass (Minmaxcountdriver.class); Job.setmapperclass (SoMinmaxcountmapper.class);
		Job.setcombinerclass (Sominmaxcountreducer.class);
		Job.setreducerclass (Sominmaxcountreducer.class);
		Job.setoutputkeyclass (Text.class);
		Job.setoutputvalueclass (Minmaxcounttuple.class);
		Fileinputformat.addinputpath (Job, New Path (otherargs[0));
		Fileoutputformat.setoutputpath (Job, New Path (otherargs[1));
	System.exit (Job.waitforcompletion (true)? 0:1);
		public static class Minmaxcounttuple implements writable {private date min = new Date ();
		Private Date max = new Date ();

		Private Long Count = 0; Private final static SimpleDateFormat frmt = new SimpleDateFormat ("Yyyy-mm-dd ' T ' HH:mm:ss.")

		SSS ");
		Public Date Getmin () {return min;
		public void Setmin (Date min) {this.min = min;
		Public Date Getmax () {return max;
		public void Setmax (Date max) {this.max = max;
		Public long GetCount () {return count;
		public void SetCount (Long count) {This.count = count; } @Override public void reAdfields (Datainput in) throws IOException {min = new Date (In.readlong ());
			max = new Date (In.readlong ());
		Count = In.readlong ();
			@Override public void Write (DataOutput out) throws IOException {Out.writelong (Min.gettime ());
			Out.writelong (Max.gettime ());
		Out.writelong (count);
		@Override public String toString () {return Frmt.format (min) + ' \ t ' + Frmt.format (max) + ' \ t ' + count;
 }
	}
}
The code for the Mrdp.utils.MRDPUtils package here is given in the first article.

The most important thing here is that you have rewritten the writable function and defined the value type yourself. I have time to open another blog to introduce the next writable function.

The map phase does not make any comparisons and calculations, but simply parses the comments.xml, then parses the time of each comment and assigns the count to 1. If the next column is parsed

<row id= "1784" postid= 883 "text=" Perfect distinction. I ' ve made a note and agree entirely. Creationdate= "2012-02-08t21:51:05.223" userid= "/>"

Mapper will userid as key, another outtuple as value, Format (min,max,count) (2012-02-08t21:51:05.223,2012-02-08t21:51:05.223,1)

The compiler phase calls the reduce function directly, doing intermediate processing.

The reducer phase calculates the data we need, that is, the maximum, the minimum, the total. The reducer time is simpler, the value loop that corresponds to each UID is taken out, then the comparison is made, and count is counted.

The whole process is shown as follows:


Some of the results obtained are as follows:

jpan@jpan-beijing:~/mywork/mapreducepatterns/testdata$ Hadoop fs-cat output2/part-r-00000	2011-02-14t18:04:38.763	2012-07-10t22:57:00.757	8	2011-04-01t03:02:45.083	2011-04-01t06:02:33.307	2
10119	2012-02-08t13:54:38.623	2012-04-12t23:43:14.810	8
1057	2011-06-17t19:59:33.013	2011-06-17t19:59:33.013	1
10691	2012-04-19t01:15:44.573	2012-05-11t05:47:36.517	2
10872	2012-06-14t15:36:26.527	2012-06-14t15:45:43.347	4
10921	2011-12-07t18:08:04.583	2011-12-07t18:08:04.583	1	2011-05-06t02 : 51:50.370	2011-05-06t14:46:31.483	3	2010-08-12t14:52:09.830	2010-08-12t14 : 52:09.830	1
1118	2011-02-17t10:27:48.623	2011-02-25t09:25:09.597	2
11498	2011-12-30t11:09:58.057	2011-12-30t11:09:58.057	1
11682	2012-01-04t21:48:39.267	2012-01-04t21:48:39.267	1


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.