MapReduce for mobile Internet log analysis (sort)

Source: Internet
Author: User

First, background 1.1 process

Implementation of the sorting, group shot on an article through the Partitioner realized.

Implement the interface, automatically generate interface methods, write properties, generate getter and setter, serialization and deserialization properties, write comparison method, rewrite ToString, in order to facilitate replication to write enough methods, but rewrite enough method map need to keep new, Found longwritable have set method, text also has, can use, produce default enough method.

public void Set (String account,double income,double expense,double surplus) {This.account = Account;this.income = income; This.expense = Expense;this.surplus = Income-expense;}
1.2 Data sets

In order to keep up with the previous article on the knowledge of the progressive, the data and change, the name has not changed.

The following is the output, in fact Mr is automatically sorted, but the string is sorted by dictionary order.

Second, theoretical knowledge

String stitching, remember before you wrote it, now take it out to see, http://www.cnblogs.com/hxsyl/archive/2012/10/18/2729112.html

The simple summary extension is as follows: The string is final and cannot be changed and cannot be inherited, so each time a change is made to the string type it is equivalent to generating a new string object, and then pointing the pointer to the new string object. Therefore, it is best not to use string to change the content of the strings, because each generation of objects will have an impact on the system performance, especially when there is no reference object in memory, the JVM's GC will start to work, the speed will be quite slow.

If the For loop is 1w times, the string + = "Hello" is the equivalent of extracting the object contents of the original string variable and "Hello" as a string addition into another new string object, and then the string variable points to the newly generated object. The deserialized bytecode file can clearly see that each loop will be new to a StringBuilder object, then the append operation, and finally the ToString method to return the string object. In other words, this loop is finished, new 10,000 objects, imagine, if these objects are not recycled, memory waste does not say, it is possible to reuse the Zhao system card dead. It can also be seen from the above that the operation of string+= "Hello" is in fact automatically optimized by the JVM:

StringBuilder str = new StringBuilder (string);

Str.append ("Hello");

Str.tostring ();

If the direct for loop is StringBuilder, it will only be new once. High efficiency.

And StringBuffer is thread-safe, many of the synchronized keyword, that is, in the multi-threaded under the sequential read to change the sprint.

Refer to this http://blog.csdn.net/loveyaozu/article/details/47037957.
Third, the entity class

The same income from low to high, otherwise income from high to low.

Package Cn.app.hadoop.mr.sort;import Java.io.datainput;import Java.io.dataoutput;import java.io.IOException;import Java.math.bigdecimal;import Org.apache.hadoop.io.writablecomparable;import Org.apache.jasper.tagplugins.jstl.core.out;//writable is a serialized interface//generic is Infobean, just like comparing student information, grades, gender, etc., encapsulated in a bean// However, Writablecomparable has been found to have serialization and deserialization of public class Infobean implements Writablecomparable<infobean>{private String account;//Money class All need bigdecimal,double homeopathic accuracy, but do not know the bottom serialization of the type of write, so first with double, estimate writeUTF can be private double income;private double Expense;private Double surplus;public String getaccount () {return account;} public void Setaccount (String account) {this.account = account;} Public double Getincome () {return income;} public void Setincome (double income) {this.income = income;} Public double Getexpense () {return expense;} public void Setexpense (double expense) {this.expense = expense;} Public double Getsurplus () {return surplus;} public void Setsurplus (double surplus) {this.surplus = surplus;} public void ReadfiELDs (Datainput in) throws IOException {//TODO auto-generated Method Stubthis.account = In.readutf (); this.income = In.read Double (); this.expense = In.readdouble (); this.surplus = In.readdouble ();} public void Write (DataOutput out) throws IOException {//TODO auto-generated Method Stubout.writeutf (account); O Ut.writedouble (income); out.writedouble (expense); out.writedouble (surplus);} public void Set (String account,double income,double expense) {This.account = Account;this.income = Income;this.expense = E Xpense;this.surplus = Income-expense;} Public Infobean () {super ();//TODO auto-generated constructor stub} @Overridepublic String toString () {return ' Infobean [i Ncome= "+ income +", expense= "+ expense+", surplus= "+ surplus +"] ";} public int compareTo (Infobean o) {//TODO auto-generated method stubif (This.income = = O.getincome ()) {return this.expense& Gt;o.getexpense ()? 1:-1;} else {return this.income>o.getincome () -1:1;}}}
Iv. the first realization of 4.1 Mapper
The first to deal with text is generally longwritable  or object//line of text is the text//output key mobile phone number positioning text//result is Databean  must implement writable interface public  Class Infosortmapper extends Mapper<longwritable, text, text, infobean> {private Infobean v = new Infobean ();p rivate Text k = new text ();p ublic void map (longwritable key, Text value, Context context) throws IOException, Interruptedexceptio n {String line = value.tostring (); string[] fields = Line.split ("\ t"); String account = Fields[0];d ouble in = double.parsedouble (fields[1]);d ouble out = double.parsedouble (fields[2]);// It  is not necessary to rewrite memory references every time new times, but also to stand for resources k.set (account); V.set (account, in, out); Context.write (k, v);}
4.2 Reducer
public class Infosortreducer extends Reducer<text, Infobean, Text, infobean> {//k is key, does not require private infobean v = new I Nfobean ();p ublic void reduce (Text key, iterable<infobean> value, Context context) throws IOException, Interruptedexception {//process valuesdouble incomesum = 0;double expensesum = 0;for (Infobean o:value) {incomeSum + = O . Getincome (); Expensesum + = O.getexpense ();} V.set (Key.tostring (), incomesum, expensesum);//databean automatically calls Tostringcontext.write (KEY,V);}}
Five, the second realization of 5.1 Mapper
The Infobean  sort  K2 is his public class Sortmapper extends Mapper<longwritable, Text, Infobean, nullwritable> { Private Infobean k = new Infobean ();p ublic void map (longwritable key, Text value, Context context) throws IOException, Inte rruptedexception {String line = value.tostring (); string[] fields = Line.split ("\ t"); String account = Fields[0];d ouble in = double.parsedouble (fields[1]);d ouble out = double.parsedouble (fields[2]);//  do not rewrite the memory reference every time new times, also very stand with resources k.set (account, in, out);//value must be Nullwritable.get (), nullwritable not, Hint is not a variable context.write (k, Nullwritable.get ());}}
5.2 Reducer
The Infobean  sort  K2 is his public class Sortmapper extends Mapper<longwritable, Text, Infobean, nullwritable> { Private Infobean k = new Infobean ();p ublic void map (longwritable key, Text value, Context context) throws IOException, Inte rruptedexception {String line = value.tostring (); string[] fields = Line.split ("\ t"); String account = Fields[0];d ouble in = double.parsedouble (fields[1]);d ouble out = double.parsedouble (fields[2]);//  do not rewrite the memory reference every time new times, also very stand with resources k.set (account, in, out);//value must be Nullwritable.get (), nullwritable not, Hint is not a variable context.write (k, Nullwritable.get ());}}
Vi. concluding remarks

If K2 v2 and K4 V4, that is, Mapp's output and the output type of reducer are inconsistent, you must also set the mapper output in main, the second of which is.

Job.setmapoutputkeyclass (Infobean.class); Job.setmapoutputvalueclass (Nullwritable.class); Job.setOutputKeyClass ( Text.class); Job.setoutputvalueclass (Infobean.class);

Otherwise, Java does not error, plus log4j after seeing type mismatch.

MapReduce for mobile internet log analysis (sort)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.