MapReduce programming template to write the "analysis site basic indicators UV" program

Source: Internet
Author: User
Tags ip number shuffle

1. Several concepts of the basic indicators of the websitepv:page View views

The number of times a page is browsed, and the user logs it once every time the page is opened.

Uv:unique Visitor Number of independent visitors

Number of people who visit a site in a day (in the case of a cookie) but if the user has deleted the browser cookie, then accessing it again will affect the record.

Vv:visit View visitor number of visits

Record how many times all visitors visited the site during the day, and visitors complete the visit until the browser is closed.

IP: Independent IP number

Refers to the number of users who use different IP addresses within a day to access the site.

2. Writing a mapreduce programming templateDriver
Package mapreduce;? Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.conf.configured;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.util.tool;import Org.apache.hadoop.util.ToolRunner;?    public class Mrdriver extends configured implements Tool {? public int run (string[] args) throws Exception {//create job Job Job = Job.getinstance (this.getconf (), "Mr-demo")        ;        Job.setjarbyclass (Mrdriver.class);?        Input reads data from HDFs by default, converting each row to Key-value path Inpath = new Path (args[0]);        Fileinputformat.setinputpaths (Job,inpath);?        Map row calls a map method to split each row of data job.setmapperclass (NULL);        Job.setmapoutputkeyclass (NULL);        Job.setmapoutputvalueclass (null);? Shuffle job.setpartitionerclass (null);//group job.setgroupingcomparatorclass (NULL);//Partition job.setsortcom PaRatorclass (null);//Sort?        The Reduce method Job.setreducerclass (null) is called once per key value.        Job.setoutputkeyclass (NULL);        Job.setoutputvalueclass (null);?        Output path Outpath = new Path (args[1]);        This.getconf () from the parent class content is empty can own set configuration information FileSystem FileSystem = Filesystem.get (this.getconf ()); If the directory already exists, delete if (filesystem.exists (Outpath)) {//if path is a directory and set to True Filesy        Stem.delete (outpath,true);        } fileoutputformat.setoutputpath (Job, Outpath);        Submit Boolean issuccess = Job.waitforcompletion (true); Return issuccess?    0:1;    }?        public static void Main (string[] args) {Configuration configuration = new configuration ();            try {int status = Toolrunner.run (configuration, New Mrdriver (), args);        System.exit (status);        } catch (Exception e) {e.printstacktrace (); }    }}?
Mapper
public class Mrmodelmapper extends mapper<longwritable,text,text,longwritable> {    @Override    protected void map (longwritable key, Text value, Context context) throws IOException, interruptedexception {        /**         * Implement your own business logic Series         */    }}
Reduce
public class Mrmodelreducer extends Reducer<text,longwritable,text,longwritable> {?    @Override    protected void reduce (Text key, iterable<longwritable> values, context context) throws IOException, interruptedexception {        /**         * Self implementation according to business requirements */    }}
3. Statistics on the number of UV per city

Analysis Requirements:

Uv:unique View unique access number, one user to remember

Map

Key:cityid (city ID) data type: Text

VALUE:GUID (User ID) data type: Text

Shuffle

Key:cityid

Value: {GUID GUID GUID ...}

Reduce

Key:cityid

Value: The access number is the collection size of the shuffle output value

Output

Key:cityid

Value: Number of accesses

Mrdriver.java MapReduce Execution Process

Package mapreduce;? Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.conf.configured;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.tool;import Org.apache.hadoop.util.ToolRunner;?    public class Mrdriver extends configured implements Tool {? public int run (string[] args) throws Exception {//create job Job Job = Job.getinstance (this.getconf (), "Mr-demo")        ;        Job.setjarbyclass (Mrdriver.class);?        Input reads data from HDFs by default, converting each row to Key-value path Inpath = new Path (args[0]);        Fileinputformat.setinputpaths (Job,inpath);?        Map row calls a map method to split each row of data job.setmapperclass (Mrmapper.class);        Job.setmapoutputkeyclass (Text.class); Job.setmapoutpUtvalueclass (Text.class);? /*//shuffle job.setpartitionerclass (NULL);//group job.setgroupingcomparatorclass (NULL);//Partition Job.setsort        Comparatorclass ();//Sort *///reduce Job.setreducerclass (Mrreducer.class);        Job.setoutputkeyclass (Text.class);        Job.setoutputvalueclass (Intwritable.class);?        Output path Outpath = new Path (args[1]);        FileSystem FileSystem = Filesystem.get (this.getconf ()); if (filesystem.exists (Outpath)) {//if path is a directory and set to True Filesystem.delete (outpath,t        Rue);                } fileoutputformat.setoutputpath (Job, Outpath);        Submit Boolean issuccess = Job.waitforcompletion (true); Return issuccess?    0:1;    }?        public static void Main (string[] args) {Configuration configuration = new configuration ();            try {int status = Toolrunner.run (configuration, New Mrdriver (), args);        System.exit (status);} catch (Exception e) {e.printstacktrace (); }    }}

Mrmapper.java

Package mapreduce;? Import java.io.IOException;? Import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.Mapper;? public class Mrmapper extends mapper<longwritable,text,text,text> {    private text Mapoutkey = new text ();    Private text MapOutKey1 = new text ();        One row calls the Map method  to split each row of data    @Override    protected void Map (longwritable key, Text value, context context)            Throws IOException, interruptedexception {                //Gets the value of each row        String str = value.tostring ();        Press space to get each item        string[] items = str.split ("\ t");                if (items[24]!=null) {            this.mapOutKey.set (items[24]);            if (items[5]!=null) {                this.mapOutKey1.set (items[5]);            }        }        Context.write (Mapoutkey, mapOutKey1);    }    }

Mpreducer.java

Package mapreduce;? Import Java.io.ioexception;import java.util.HashSet;? Import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.Reducer;? public class Mrreducer extends Reducer<text, text, text, intwritable>{?    Each key value data is executed once for the reduce method    @Override    protected void reduce (Text key, iterable<text> texts, Reducer <text, text, text, Intwritable> Context context)            throws IOException, interruptedexception {                hashset<string> set = new hashset<string > ();                for (Text text:texts) {            set.add (text.tostring ());        }                Context.write (Key,new intwritable (Set.size ()));        }   
4.MapReduce Execution WordCount Process Understandinginput: Read data from HDFs by default
Path Inpath = new Path (args[0]); Fileinputformat.setinputpaths (Job,inpath);

Converts each row of data to Key-value (split), which is done automatically by the MapReduce framework.

The offset of the output line and the contents of the row

mapper: Participle output

Data filtering, data completion, field formatting

Input: Output of input

The segmented <key,value> is processed to the user-defined map method to generate a new <key,value> pair.

Call the Map method one line at a time.

To count the map in Word:

Shuffle: Partitioning, grouping, sorting

Output:

<Bye,1>

<Hello,1>

<World,1,1>

To get the map output <key,value>, mapper will sort them by key and get the final output of mapper.

reduce: Each keyvalue calls the Reduce method

Addand Sum the list <value> of the same key

output: Write reduce output to HDFs

MapReduce programming template to write the "analysis site basic indicators UV" program

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.