Hadoop Learning Notes-5. Custom type handling Mobile Internet logs

Source: Internet
Author: User

First, test data: Mobile Internet Log 1.1 about this log

Suppose we are a log file, the content of this file is from a telecommunications operator's mobile internet logs, the contents of the file has been optimized, the format is more structured, easy to study.

The contents of the file are as follows (I have only intercepted three lines here):

1363157993044 18211575961 94-71-ac-cd-e6-18:cmcc-easy 120.196.100.99 iface.qiyi.com video site 15 1527 2106

1363157995033 15920133257  5c-0e-8b-c7-ba-20:cmcc 120.197.40.4 sug.so.360.cn  Information security  20 203156  2936 200

1363157982040 13502468823 5c-0a-5b-6a-0b-d4:cmcc-easy 120.196.100.99 y0.ifengimg.com Integrated Portal 57 102 7335 110349

Each line has a different meaning for each field, and the exact meaning is as follows:

1.2 Goals to achieve

With the above test data-Mobile internet logs, then the question is, how to achieve statistics of different mobile phone number of users of the Internet traffic information through Map-reduce? The above table shows that the 6th to 9th field is information about traffic, which means that we need to count the number of Uppacknum, Downpacknum, Uppayload, and downpayload four fields for each user and achieve the following results:

13480253104 3 3) 180 180

13502468823 57 102) 7335 110349

Second, the solution idea: Package mobile phone Traffic 2.1 writable interface

After a previous study, we learned that all data types operating in Hadoop need to implement an interface called writable, which enables the interface to support serialization for easy read and write in Hadoop.

Public interface Writable {  /** * Serialize The fields of this    object to <code>out</code>.   *  /Void Write (DataOutput out) throws IOException;  /** * Deserialize The fields of this    object from <code>in</code>.     *  /void ReadFields (Datainput in) throws IOException;}

From the code above you can see that the writable interface is defined by only two methods, one is the Write method, and the other is the ReadFields method. The former is to serialize the object's attributes into DataOutput, which is to deserialize the data from the Datainput into the object's properties . ("Read in", "Write out")

The basic types in Java are char, Byte, Boolean, short, int, float, and double with a total of 7 basic types, except char, which have corresponding writable types. However, there is no corresponding type that we need. Thus, we need to encapsulate a custom data type with the existing corresponding writable type for use in this experiment.

2.2 Package Kpiwritable Type

We need to count the number of Uppacknum, Downpacknum, Uppayload, and downpayload four fields for each user, and the four fields are all long, so we can encapsulate the following code:

    /* Custom Data type Kpiwritable */public class Kpiwritable implements writable {long uppacknum;    Uplink packet number, unit: a long downpacknum;     Number of downlink packets, unit: a long uppayload;    Upstream total flow, unit: byte long downpayload; Total downstream traffic, unit: Byte public kpiwritable () {} public kpiwritable (string uppack, String downpack, String u            Ppay, String downpay) {uppacknum = Long.parselong (Uppack);            Downpacknum = Long.parselong (Downpack);            Uppayload = Long.parselong (Uppay);        Downpayload = Long.parselong (Downpay); } @Override Public String toString () {string result = Uppacknum + "\ T" + Downpacknum + "\ T" + up            PayLoad + "\ T" + downpayload;        return result;            } @Override public void write (DataOutput out) throws IOException {Out.writelong (uppacknum);            Out.writelong (Downpacknum); Out.writelong (UppaylOad);        Out.writelong (Downpayload);            } @Override public void ReadFields (Datainput in) throws IOException {uppacknum = In.readlong ();            Downpacknum = In.readlong ();            Uppayload = In.readlong ();        Downpayload = In.readlong (); }    }

By implementing the two methods of the writable interface, the kpiwritable type is encapsulated.

Third, the implementation of the program: Still MapReduce3.1 Custom mapper class
    /     * * Custom Mapper class, overriding the Map method     */public    static class Mymapper extends            mapper<longwritable, text, text, kpiwritable> {        protected void map (                longwritable K1,                Text v1,                org.apache.hadoop.mapreduce.Mapper <longwritable, text, text, Kpiwritable> Context context)                throws IOException, interruptedexception {            string[] spilted = v1.tostring (). Split ("\ t");            String msisdn = spilted[1]; Get phone number            text K2 = new text (MSISDN);//Convert to Hadoop data type and as K2            kpiwritable v2 = new kpiwritable (spilted[6], spilted [7],                    spilted[8], spilted[9]);            Context.write (K2, V2);        };    }

Here, the data for the 6th to 9th field is encapsulated in the kpiwritable type, and the phone number and kpiwritable are passed into the next stage as <k2,v2>;

3.2 Customizing the Reducer class
    /* * Custom Reducer class, override the Reduce method */public static class Myreducer extends Reducer<text, Kpiwritab Le, text, kpiwritable> {protected void reduce (text K2, JAVA.LANG.ITERABLE&LT;KPI Writable> V2s, Org.apache.hadoop.mapreduce.reducer<text, kpiwritable, Text, Kpiwritable>.            Context context) throws IOException, interruptedexception {long uppacknum = 0L;            Long downpacknum = 0L;            Long uppayload = 0L;            Long downpayload = 0L;                for (kpiwritable kpiwritable:v2s) {uppacknum + = Kpiwritable.uppacknum;                Downpacknum + = Kpiwritable.downpacknum;                Uppayload + = Kpiwritable.uppayload;            Downpayload + = Kpiwritable.downpayload; } kpiwritable v3 = new Kpiwritable (Uppacknum + "", Downpacknum + "", Uppayload + "", Downpay            Load + ""); Context.write (K2, V3);    }; }

Here, the map phase of each cell phone number corresponding to the traffic records are summed together, and finally generate a new Kpiwritable type object and mobile phone number as a new <k3,v3> return;

3.3 Complete Code implementation

The complete code is as follows:

View Code3.4 Debugging Run Effect

Accessories download

(1) This use of mobile Internet logs (in some editions): Http://pan.baidu.com/s/1dDzqHWX

original link:http://edisonchou.cnblogs.com/

Hadoop Learning Notes-5. Custom type handling Mobile Internet logs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.