1. Several concepts of the basic indicators of the websitepv:page View views
The number of times a page is browsed, and the user logs it once every time the page is opened.
Uv:unique Visitor Number of independent visitors
Number of people who visit a site in a day (in the case of a cookie) but if the user has deleted the browser cookie, then accessing it again will affect the record.
Vv:visit View visitor number of visits
Record how many times all visitors visited the site during the day, and visitors complete the visit until the browser is closed.
IP: Independent IP number
Refers to the number of users who use different IP addresses within a day to access the site.
2. Writing a mapreduce programming templateDriver
Package mapreduce;? Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.conf.configured;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.util.tool;import Org.apache.hadoop.util.ToolRunner;? public class Mrdriver extends configured implements Tool {? public int run (string[] args) throws Exception {//create job Job Job = Job.getinstance (this.getconf (), "Mr-demo") ; Job.setjarbyclass (Mrdriver.class);? Input reads data from HDFs by default, converting each row to Key-value path Inpath = new Path (args[0]); Fileinputformat.setinputpaths (Job,inpath);? Map row calls a map method to split each row of data job.setmapperclass (NULL); Job.setmapoutputkeyclass (NULL); Job.setmapoutputvalueclass (null);? Shuffle job.setpartitionerclass (null);//group job.setgroupingcomparatorclass (NULL);//Partition job.setsortcom PaRatorclass (null);//Sort? The Reduce method Job.setreducerclass (null) is called once per key value. Job.setoutputkeyclass (NULL); Job.setoutputvalueclass (null);? Output path Outpath = new Path (args[1]); This.getconf () from the parent class content is empty can own set configuration information FileSystem FileSystem = Filesystem.get (this.getconf ()); If the directory already exists, delete if (filesystem.exists (Outpath)) {//if path is a directory and set to True Filesy Stem.delete (outpath,true); } fileoutputformat.setoutputpath (Job, Outpath); Submit Boolean issuccess = Job.waitforcompletion (true); Return issuccess? 0:1; }? public static void Main (string[] args) {Configuration configuration = new configuration (); try {int status = Toolrunner.run (configuration, New Mrdriver (), args); System.exit (status); } catch (Exception e) {e.printstacktrace (); } }}?
Mapper
public class Mrmodelmapper extends mapper<longwritable,text,text,longwritable> { @Override protected void map (longwritable key, Text value, Context context) throws IOException, interruptedexception { /** * Implement your own business logic Series */ }}
Reduce
public class Mrmodelreducer extends Reducer<text,longwritable,text,longwritable> {? @Override protected void reduce (Text key, iterable<longwritable> values, context context) throws IOException, interruptedexception { /** * Self implementation according to business requirements */ }}
3. Statistics on the number of UV per city
Analysis Requirements:
Uv:unique View unique access number, one user to remember
Map
Key:cityid (city ID) data type: Text
VALUE:GUID (User ID) data type: Text
Shuffle
Key:cityid
Value: {GUID GUID GUID ...}
Reduce
Key:cityid
Value: The access number is the collection size of the shuffle output value
Output
Key:cityid
Value: Number of accesses
Mrdriver.java MapReduce Execution Process
Package mapreduce;? Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.conf.configured;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.tool;import Org.apache.hadoop.util.ToolRunner;? public class Mrdriver extends configured implements Tool {? public int run (string[] args) throws Exception {//create job Job Job = Job.getinstance (this.getconf (), "Mr-demo") ; Job.setjarbyclass (Mrdriver.class);? Input reads data from HDFs by default, converting each row to Key-value path Inpath = new Path (args[0]); Fileinputformat.setinputpaths (Job,inpath);? Map row calls a map method to split each row of data job.setmapperclass (Mrmapper.class); Job.setmapoutputkeyclass (Text.class); Job.setmapoutpUtvalueclass (Text.class);? /*//shuffle job.setpartitionerclass (NULL);//group job.setgroupingcomparatorclass (NULL);//Partition Job.setsort Comparatorclass ();//Sort *///reduce Job.setreducerclass (Mrreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Intwritable.class);? Output path Outpath = new Path (args[1]); FileSystem FileSystem = Filesystem.get (this.getconf ()); if (filesystem.exists (Outpath)) {//if path is a directory and set to True Filesystem.delete (outpath,t Rue); } fileoutputformat.setoutputpath (Job, Outpath); Submit Boolean issuccess = Job.waitforcompletion (true); Return issuccess? 0:1; }? public static void Main (string[] args) {Configuration configuration = new configuration (); try {int status = Toolrunner.run (configuration, New Mrdriver (), args); System.exit (status);} catch (Exception e) {e.printstacktrace (); } }}
Mrmapper.java
Package mapreduce;? Import java.io.IOException;? Import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.Mapper;? public class Mrmapper extends mapper<longwritable,text,text,text> { private text Mapoutkey = new text (); Private text MapOutKey1 = new text (); One row calls the Map method to split each row of data @Override protected void Map (longwritable key, Text value, context context) Throws IOException, interruptedexception { //Gets the value of each row String str = value.tostring (); Press space to get each item string[] items = str.split ("\ t"); if (items[24]!=null) { this.mapOutKey.set (items[24]); if (items[5]!=null) { this.mapOutKey1.set (items[5]); } } Context.write (Mapoutkey, mapOutKey1); } }
Mpreducer.java
Package mapreduce;? Import Java.io.ioexception;import java.util.HashSet;? Import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.Reducer;? public class Mrreducer extends Reducer<text, text, text, intwritable>{? Each key value data is executed once for the reduce method @Override protected void reduce (Text key, iterable<text> texts, Reducer <text, text, text, Intwritable> Context context) throws IOException, interruptedexception { hashset<string> set = new hashset<string > (); for (Text text:texts) { set.add (text.tostring ()); } Context.write (Key,new intwritable (Set.size ())); }
4.MapReduce Execution WordCount Process Understandinginput: Read data from HDFs by default
Path Inpath = new Path (args[0]); Fileinputformat.setinputpaths (Job,inpath);
Converts each row of data to Key-value (split), which is done automatically by the MapReduce framework.
The offset of the output line and the contents of the row
mapper: Participle output
Data filtering, data completion, field formatting
Input: Output of input
The segmented <key,value> is processed to the user-defined map method to generate a new <key,value> pair.
Call the Map method one line at a time.
To count the map in Word:
Shuffle: Partitioning, grouping, sorting
Output:
<Bye,1>
<Hello,1>
<World,1,1>
To get the map output <key,value>, mapper will sort them by key and get the final output of mapper.
reduce: Each keyvalue calls the Reduce method
Addand Sum the list <value> of the same key
output: Write reduce output to HDFs
MapReduce programming template to write the "analysis site basic indicators UV" program