MapReduce programming template to write the "analysis site basic indicators UV" program

Last Update:2018-04-29 Source: Internet

Author: User

Tags ip number shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Several concepts of the basic indicators of the websitepv:page View views

The number of times a page is browsed, and the user logs it once every time the page is opened.

Uv:unique Visitor Number of independent visitors

Number of people who visit a site in a day (in the case of a cookie) but if the user has deleted the browser cookie, then accessing it again will affect the record.

Vv:visit View visitor number of visits

Record how many times all visitors visited the site during the day, and visitors complete the visit until the browser is closed.

IP: Independent IP number

Refers to the number of users who use different IP addresses within a day to access the site.

2. Writing a mapreduce programming templateDriver

Package mapreduce;? Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.conf.configured;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.util.tool;import Org.apache.hadoop.util.ToolRunner;?    public class Mrdriver extends configured implements Tool {? public int run (string[] args) throws Exception {//create job Job Job = Job.getinstance (this.getconf (), "Mr-demo")        ;        Job.setjarbyclass (Mrdriver.class);?        Input reads data from HDFs by default, converting each row to Key-value path Inpath = new Path (args[0]);        Fileinputformat.setinputpaths (Job,inpath);?        Map row calls a map method to split each row of data job.setmapperclass (NULL);        Job.setmapoutputkeyclass (NULL);        Job.setmapoutputvalueclass (null);? Shuffle job.setpartitionerclass (null);//group job.setgroupingcomparatorclass (NULL);//Partition job.setsortcom PaRatorclass (null);//Sort?        The Reduce method Job.setreducerclass (null) is called once per key value.        Job.setoutputkeyclass (NULL);        Job.setoutputvalueclass (null);?        Output path Outpath = new Path (args[1]);        This.getconf () from the parent class content is empty can own set configuration information FileSystem FileSystem = Filesystem.get (this.getconf ()); If the directory already exists, delete if (filesystem.exists (Outpath)) {//if path is a directory and set to True Filesy        Stem.delete (outpath,true);        } fileoutputformat.setoutputpath (Job, Outpath);        Submit Boolean issuccess = Job.waitforcompletion (true); Return issuccess?    0:1;    }?        public static void Main (string[] args) {Configuration configuration = new configuration ();            try {int status = Toolrunner.run (configuration, New Mrdriver (), args);        System.exit (status);        } catch (Exception e) {e.printstacktrace (); }    }}?

Mapper

public class Mrmodelmapper extends mapper<longwritable,text,text,longwritable> {    @Override    protected void map (longwritable key, Text value, Context context) throws IOException, interruptedexception {        /**         * Implement your own business logic Series         */    }}

Reduce

public class Mrmodelreducer extends Reducer<text,longwritable,text,longwritable> {?    @Override    protected void reduce (Text key, iterable<longwritable> values, context context) throws IOException, interruptedexception {        /**         * Self implementation according to business requirements */    }}

3. Statistics on the number of UV per city

Analysis Requirements:

Uv:unique View unique access number, one user to remember

Map

Key:cityid (city ID) data type: Text

VALUE:GUID (User ID) data type: Text

Shuffle

Key:cityid

Value: {GUID GUID GUID ...}

Reduce

Key:cityid

Value: The access number is the collection size of the shuffle output value

Output

Key:cityid

Value: Number of accesses

Mrdriver.java MapReduce Execution Process

Package mapreduce;? Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.conf.configured;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.tool;import Org.apache.hadoop.util.ToolRunner;?    public class Mrdriver extends configured implements Tool {? public int run (string[] args) throws Exception {//create job Job Job = Job.getinstance (this.getconf (), "Mr-demo")        ;        Job.setjarbyclass (Mrdriver.class);?        Input reads data from HDFs by default, converting each row to Key-value path Inpath = new Path (args[0]);        Fileinputformat.setinputpaths (Job,inpath);?        Map row calls a map method to split each row of data job.setmapperclass (Mrmapper.class);        Job.setmapoutputkeyclass (Text.class); Job.setmapoutpUtvalueclass (Text.class);? /*//shuffle job.setpartitionerclass (NULL);//group job.setgroupingcomparatorclass (NULL);//Partition Job.setsort        Comparatorclass ();//Sort *///reduce Job.setreducerclass (Mrreducer.class);        Job.setoutputkeyclass (Text.class);        Job.setoutputvalueclass (Intwritable.class);?        Output path Outpath = new Path (args[1]);        FileSystem FileSystem = Filesystem.get (this.getconf ()); if (filesystem.exists (Outpath)) {//if path is a directory and set to True Filesystem.delete (outpath,t        Rue);                } fileoutputformat.setoutputpath (Job, Outpath);        Submit Boolean issuccess = Job.waitforcompletion (true); Return issuccess?    0:1;    }?        public static void Main (string[] args) {Configuration configuration = new configuration ();            try {int status = Toolrunner.run (configuration, New Mrdriver (), args);        System.exit (status);} catch (Exception e) {e.printstacktrace (); }    }}

Mrmapper.java

Package mapreduce;? Import java.io.IOException;? Import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.Mapper;? public class Mrmapper extends mapper<longwritable,text,text,text> {    private text Mapoutkey = new text ();    Private text MapOutKey1 = new text ();        One row calls the Map method  to split each row of data    @Override    protected void Map (longwritable key, Text value, context context)            Throws IOException, interruptedexception {                //Gets the value of each row        String str = value.tostring ();        Press space to get each item        string[] items = str.split ("\ t");                if (items[24]!=null) {            this.mapOutKey.set (items[24]);            if (items[5]!=null) {                this.mapOutKey1.set (items[5]);            }        }        Context.write (Mapoutkey, mapOutKey1);    }    }

Mpreducer.java

Package mapreduce;? Import Java.io.ioexception;import java.util.HashSet;? Import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.Reducer;? public class Mrreducer extends Reducer<text, text, text, intwritable>{?    Each key value data is executed once for the reduce method    @Override    protected void reduce (Text key, iterable<text> texts, Reducer <text, text, text, Intwritable> Context context)            throws IOException, interruptedexception {                hashset<string> set = new hashset<string > ();                for (Text text:texts) {            set.add (text.tostring ());        }                Context.write (Key,new intwritable (Set.size ()));        }

4.MapReduce Execution WordCount Process Understandinginput: Read data from HDFs by default

Path Inpath = new Path (args[0]); Fileinputformat.setinputpaths (Job,inpath);

Converts each row of data to Key-value (split), which is done automatically by the MapReduce framework.

The offset of the output line and the contents of the row

mapper: Participle output

Data filtering, data completion, field formatting

Input: Output of input

The segmented <key,value> is processed to the user-defined map method to generate a new <key,value> pair.

Call the Map method one line at a time.

To count the map in Word:

Shuffle: Partitioning, grouping, sorting

Output:

<Bye,1>

<Hello,1>

<World,1,1>

To get the map output <key,value>, mapper will sort them by key and get the final output of mapper.

reduce: Each keyvalue calls the Reduce method

Addand Sum the list <value> of the same key

output: Write reduce output to HDFs

MapReduce programming template to write the "analysis site basic indicators UV" program

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More