First, test data: Mobile Internet Log 1.1 about this log
Suppose we are a log file, the content of this file is from a telecommunications operator's mobile internet logs, the contents of the file has been optimized, the format is more structured, easy to study.
The contents of the file are as follows (I have only intercepted three lines here):
1363157993044 18211575961 94-71-ac-cd-e6-18:cmcc-easy 120.196.100.99 iface.qiyi.com video site 15 1527 2106
1363157995033 15920133257 5c-0e-8b-c7-ba-20:cmcc 120.197.40.4 sug.so.360.cn Information security 20 203156 2936 200
1363157982040 13502468823 5c-0a-5b-6a-0b-d4:cmcc-easy 120.196.100.99 y0.ifengimg.com Integrated Portal 57 102 7335 110349
Each line has a different meaning for each field, and the exact meaning is as follows:
1.2 Goals to achieve
With the above test data-Mobile internet logs, then the question is, how to achieve statistics of different mobile phone number of users of the Internet traffic information through Map-reduce? The above table shows that the 6th to 9th field is information about traffic, which means that we need to count the number of Uppacknum, Downpacknum, Uppayload, and downpayload four fields for each user and achieve the following results:
13480253104 3 3) 180 180
13502468823 57 102) 7335 110349
Second, the solution idea: Package mobile phone Traffic 2.1 writable interface
After a previous study, we learned that all data types operating in Hadoop need to implement an interface called writable, which enables the interface to support serialization for easy read and write in Hadoop.
Public interface Writable { /** * Serialize The fields of this object to <code>out</code>. * /Void Write (DataOutput out) throws IOException; /** * Deserialize The fields of this object from <code>in</code>. * /void ReadFields (Datainput in) throws IOException;}
From the code above you can see that the writable interface is defined by only two methods, one is the Write method, and the other is the ReadFields method. The former is to serialize the object's attributes into DataOutput, which is to deserialize the data from the Datainput into the object's properties . ("Read in", "Write out")
The basic types in Java are char, Byte, Boolean, short, int, float, and double with a total of 7 basic types, except char, which have corresponding writable types. However, there is no corresponding type that we need. Thus, we need to encapsulate a custom data type with the existing corresponding writable type for use in this experiment.
2.2 Package Kpiwritable Type
We need to count the number of Uppacknum, Downpacknum, Uppayload, and downpayload four fields for each user, and the four fields are all long, so we can encapsulate the following code:
/* Custom Data type Kpiwritable */public class Kpiwritable implements writable {long uppacknum; Uplink packet number, unit: a long downpacknum; Number of downlink packets, unit: a long uppayload; Upstream total flow, unit: byte long downpayload; Total downstream traffic, unit: Byte public kpiwritable () {} public kpiwritable (string uppack, String downpack, String u Ppay, String downpay) {uppacknum = Long.parselong (Uppack); Downpacknum = Long.parselong (Downpack); Uppayload = Long.parselong (Uppay); Downpayload = Long.parselong (Downpay); } @Override Public String toString () {string result = Uppacknum + "\ T" + Downpacknum + "\ T" + up PayLoad + "\ T" + downpayload; return result; } @Override public void write (DataOutput out) throws IOException {Out.writelong (uppacknum); Out.writelong (Downpacknum); Out.writelong (UppaylOad); Out.writelong (Downpayload); } @Override public void ReadFields (Datainput in) throws IOException {uppacknum = In.readlong (); Downpacknum = In.readlong (); Uppayload = In.readlong (); Downpayload = In.readlong (); } }
By implementing the two methods of the writable interface, the kpiwritable type is encapsulated.
Third, the implementation of the program: Still MapReduce3.1 Custom mapper class
/ * * Custom Mapper class, overriding the Map method */public static class Mymapper extends mapper<longwritable, text, text, kpiwritable> { protected void map ( longwritable K1, Text v1, org.apache.hadoop.mapreduce.Mapper <longwritable, text, text, Kpiwritable> Context context) throws IOException, interruptedexception { string[] spilted = v1.tostring (). Split ("\ t"); String msisdn = spilted[1]; Get phone number text K2 = new text (MSISDN);//Convert to Hadoop data type and as K2 kpiwritable v2 = new kpiwritable (spilted[6], spilted [7], spilted[8], spilted[9]); Context.write (K2, V2); }; }
Here, the data for the 6th to 9th field is encapsulated in the kpiwritable type, and the phone number and kpiwritable are passed into the next stage as <k2,v2>;
3.2 Customizing the Reducer class
/* * Custom Reducer class, override the Reduce method */public static class Myreducer extends Reducer<text, Kpiwritab Le, text, kpiwritable> {protected void reduce (text K2, JAVA.LANG.ITERABLE<KPI Writable> V2s, Org.apache.hadoop.mapreduce.reducer<text, kpiwritable, Text, Kpiwritable>. Context context) throws IOException, interruptedexception {long uppacknum = 0L; Long downpacknum = 0L; Long uppayload = 0L; Long downpayload = 0L; for (kpiwritable kpiwritable:v2s) {uppacknum + = Kpiwritable.uppacknum; Downpacknum + = Kpiwritable.downpacknum; Uppayload + = Kpiwritable.uppayload; Downpayload + = Kpiwritable.downpayload; } kpiwritable v3 = new Kpiwritable (Uppacknum + "", Downpacknum + "", Uppayload + "", Downpay Load + ""); Context.write (K2, V3); }; }
Here, the map phase of each cell phone number corresponding to the traffic records are summed together, and finally generate a new Kpiwritable type object and mobile phone number as a new <k3,v3> return;
3.3 Complete Code implementation
The complete code is as follows:
View Code3.4 Debugging Run Effect
Accessories download
(1) This use of mobile Internet logs (in some editions): Http://pan.baidu.com/s/1dDzqHWX
original link:http://edisonchou.cnblogs.com/
Hadoop Learning Notes-5. Custom type handling Mobile Internet logs