006 User Behavior PV&UV Statistical data

Source: Internet
Author: User
Tags create database lenovo
First, PV statistics (page traffic)(1) Basic concepts
is usually the main indicator for measuring a network news channel or website or even a web news. Web page views is one of the most commonly used indicators for evaluating website traffic, referred to as PV. Monitoring the website PV trends and analysis of the reasons for the change is a lot of webmaster regularly do the work. Page views in the page generally refers to ordinary HTML pages, but also contains PHP, JSP and other dynamically generated HTML content. An HTML content request from the browser is considered to be a PV, which gradually accumulates as a PV total.

(2) Calculation method
Every 1 times a user accesses each page in a Web site 1 times. User multiple access to the same page, the amount of traffic accumulated.

(3) statistical analysis--1. Creating a Database
Create database Jfyun;
Use Jfyun;

--2. Create a user access record table, first create a good partition table
Create external Table Data_collect (
accessdate    string,
accesshour   int,
Requestmethod   String,
referurl   string,
requestprotocal   string,
returnstatus   string,
Requesturl   string,
referdomain   string,
userorigin   string,
Originword   string ,
browser   string,
browserversion   string,
operatesystem   string,
Requestip   string,
ipnumber   int,
userprovince   string,
screensize   string,
Screencolor   string,
pagetitle   string,
sitetype   string,
userflag   string,
visitflag   string,
sflag   string,
timeonpage int
) partitioned by (Access_day string)
row format delimited fields
terminated by ' \ t ' location
'/user/hadoop/external/jfpc/output ';

--3. Create a partition for a table (partition is created first, and then the data is loaded into the partition using a partition
ALTER TABLE Data_collect Add partition (access_day= ' 20150705 ');
ALTER TABLE Data_collect Add partition (access_day= ' 20150706 ');

--4. Execute a mapreduce program to store data/load data into a partitioned table
Hadoop jar Jfyun.jar Com.yun.job.AccessLogEnhanceImportHDFS external/jfpc/input/20150705/130/ Clickdata-2015070500.log external/jfpc/output/access_day=20150705
Hadoop jar Jfyun.jar Com.yun.job.AccessLogEnhanceImportHDFS External/jfpc/input/20150705/131/clickdata-2015070500.log external/jfpc/ output/access_day=20150705
Hadoop jar Jfyun.jar Com.yun.job.AccessLogEnhanceImportHDFS external/jfpc/input/ 20150705/130/clickdata-2015070501.log external/jfpc/output/access_day=20150705
Hadoop jar Jfyun.jar Com.yun.job.AccessLogEnhanceImportHDFS External/jfpc/input/20150705/131/clickdata-2015070501.log external/jfpc/ output/access_day=20150705

Hadoop jar Jfyun.jar Com.yun.job.AccessLogEnhanceImportHDFS external/jfpc/input/ 20150706 external/jfpc/output/access_day=20150706



--5. Show Table partition show partitions data_collect; --6. Viewing partition data based on partitioning criteria
SELECT * from Data_collect where access_day= ' 20150705 ';
SELECT * from Data_collect where access_day= ' 20150706 ';

--7. Analyzing PV Data via hive
--7.1. PV By day statistics
Select substr (accessdate,1,8), COUNT (1) from Data_collect  where access_day= ' 20150706 ' GROUP by substr ( accessdate,1,8);
--7.2. PV hourly statistics, inserted into the specified table
select Accesshour,count (1) stacount from Data_collect where access_day= ' 20150706 ' GROUP by Accessho ur;
--7.3 per day in each province PV
select substr (accessdate,1,8), Userprovince,count (1) from Data_collect where access_day= ' 20150706 ' Group by substr (accessdate,1,8), userprovince
--7.4 per hour per day per province
select substr (accessdate,1,8), userprovince , Accesshour,count (1) from Data_collect where access_day= ' 20150706 ' GROUP by substr (accessdate,1,8), Userprovince, Accesshour

Second, UV statistics (independent visitors)
(1) Basic concept independent IP: refers to independent users/independent visitors. Refers to the number of people who visit a site or click a different IP address for a news message

(2) Calculation method
In the same day 00:00-24:00, independent IP records only the first access to the site with a separate IP visitors, you can set a cookie, record the first access to set up a new user, followed by the old user

(3) Statistical analysis Project requirements: (1) Users visit e-commerce website, through the way of JS interpolation to collect user behavior log, and then through the MapReduce program to the user log into HBase, in accordance with the UV table calculation. (2) After-storage data to be statistical analysis (3) User log format (simulated data)
"06/jul/2015:00:01:04 +0800" "GET" "http%3a//jf.10086.cn/m/" "http/1.1" "$" "http://jf.10086.cn/m/subject/ 100000000000009_0.html "" mozilla/5.0 (Linux; U Android 4.4.2; ZH-CN; Lenovo a3800-d build/lenovoa3800-d) applewebkit/533.1 (khtml, like Gecko) version/4.0 mqqbrowser/5.4 tbs/025438 Mobile safari/533.1 micromessenger/6.2.0.70_r1180778.561 nettype/cmnet language/zh_cn "" 10.139.198.176 "" 480x854 "" 24 ""% U5927%u7c7b%u5217%u8868%u9875_%u4e2d%u56fd%u79fb%u52a8%u79ef%u5206%u5546%u57ce "" 0 "" 3037487029517069460000 "" 3037487029517069460000 "" "1" "06/jul/2015:01:01:04" "+0800" "GET" "http%3a//jf.10086.cn/portal/ware/web/ searchwareaction%3faction%3dsearchwareinfo%26pager.offset%3d144 "" http/1.1 "" "" Http://jf.10086.cn/portal/ware " /web/searchwareaction?action=searchwareinfo&pager.offset=156 "" mozilla/5.0 (Linux; U Android 4.4.2; ZH-CN; HUAWEI mt2-l01 build/huaweimt2-l01) applewebkit/534.30 (khtml, like Gecko) version/4.0 ucbrowser/10.5.2.598 U3/0.8.0 Mobile safari/534.30 "" 223.73.104.224 "" 720x1208 "" + ""%u641c%u7d22_%u4e2d%u56fd%u79fb%u52a8%u79ef%u5206%u5546%u57ce "" 0 "" 3046252153674140570000 " "3046252153674140570000" "1" "2699" "06/jul/2015:02:01:04 +0800" "GET" "" "http/1.1" "" "" http://jf.10086.cn/"" mozilla/5.0 (Linux; Android 4.4.4; Vivo y13l build/ktu84p) applewebkit/537.36 (khtml, like Gecko) version/4.0 chrome/33.0.0.0 Mobile safari/537.36 baiduboxa pp/5.1 (Baidu; P1 4.4.4) "" 10.154.210.240 "" 480x855 "" + ""%u9996%u9875_%u4e2d%u56fd%u79fb%u52a8%u79ef%u5206%u5546%u57ce "" 0 "" 3098781670304015290000 "" 3098781670304015290000 "" 0 "" 831 "" 06/jul/2015:03:01:07 +0800 "" GET "" http%3a//wx.10086.cn/ Wechat-website/wechatwebsite/accumulatepoints "" http/1.1 "" "" "" http://jf.10086.cn/m/"" mozilla/5.0 (Linux; U Android 4.4.2; ZH-CN; Lenovo a3800-d build/lenovoa3800-d) applewebkit/533.1 (khtml, like Gecko) version/4.0 mqqbrowser/5.4 tbs/025438 Mobile safari/533.1 micromessenger/6.2.0.70_r1180778.561 nettype/cmnet language/zh_cn "" 10.139.198.176 "" 480x854 "" 24 ""% u9996%u9875_%u4e2d%u56fd%u79fb%u52a8%u79ef%u5206%u5546%u57ce "" 0 "" 3037487029517069460000 "" 3037487029517069460000 "" 1 " "135"

(4) Data source, can refer to the following website http://jf.10086.cn/analyzeVesopera.gif?screenSize=1366x768&screenColor=24&pageTitle=% U9996%u9875_%u4e2d%u56fd%u79fb%u52a8%u79ef%u5206%u5546%u57ce&referrerpage=&sitetype=0&uid= 20523849176242946000&sid=56080848979763680000&sflag=1&countlog=1443006061700&onloadtotaltime= 135
Technical solutions: (1) write MapReduce, read each row of data and then save HBase (2) Let hive manipulate hbase table data (3) Hive Statistics Analysis hbase table data, analyze user visitor behavior
1. Create a table
Create ' Uservisitinfo ', {NAME = ' info '}

2. Import HBase
Hadoop jar Jfyun.jar Com.yun.job.AccessLogImportHBase External/jfpc/input/20150705/130/clickdata-2015070500.log
Hadoop jar Jfyun.jar com.yun.job.AccessLogImportHBase external/jfpc/input/20150705/131/ Clickdata-2015070500.log

Hadoop jar Jfyun.jar com.yun.job.AccessLogImportHBase external/jfpc/input/20150705/ 130/clickdata-2015070501.log
Hadoop jar Jfyun.jar com.yun.job.AccessLogImportHBase external/jfpc/input/ 20150705/131/clickdata-2015070501.log

Hadoop jar Jfyun.jar com.yun.job.AccessLogImportHBase external/jfpc/ input/20150706

3. View data in HBase 3.1 full table view
Scan ' Uservisitinfo '

3.2 According to Rowkey view
HBase (main):012:0> get ' uservisitinfo ', ' 20150706_3037487029517069460000 ' COLUMN CELL Info:firstaccessurl timestamp=14430 00064923, value=/m/subject/100000000000009_0.html info:browser timestamp=144300 0064923, Value=safari info:browserversion timestamp=1443000 064923, value=533.1 info:firstaccesstime timestamp=14430000 64923, value=20150706000104 Info:operatesystem timestamp=144300006 4923, Value=linux info:recentaccesstime timestamp=1443000065 001, value=20150706030107 Info:recentaccessurl timestamp=14430000650           value=/m/,                                           
 Info:screencolor timestamp=1443000064923, value=24                                                  
 Info:screensize timestamp=1443000064923, value=480x854                                                        
 Info:sitetype timestamp=1443000064923, value=0 Info:userflag timestamp=1443000064923, value=303748702951706946                                                      
 0000 info:userprovince timestamp=1443000064923, value=999 Info:uservisitid timestamp=1443000064923, value=20150706_30374870295                                                        
 17069460000 Info:visitcount timestamp=1443000065001, value=2                                      Info:visitday timestamp=1443000064923, value=20150706           
 Info:visitflag timestamp=1443000064923, value=3037487029517069460000                                                        
 Info:visithour timestamp=1443000064923, value=0                                           
 Info:visitip timestamp=1443000064923, value=10.139.198.176    Info:visitkeeptime timestamp=1443000065001, value=10803

4, Statistics hive analysis hbase Table Data 4.1 Create an HBase table, add data to the HBase table Uservisitinfo 4.2 Create a hive table for hbase table mappings (1) Create a table
CREATE external TABLE user_visit_info (Uservisitid string, Firstaccessurl string, browserversion string,  Firstaccesstime String, Operatesystem                                                   
 String, recentaccesstime string, Recentaccessurl string, Screencolor String, screensize s                                  
 Tring, SiteType string, Userflag string,                                                     
 Userprovince string, Visitcount string,                                  
 Visitday string, Visitflag string, Visithour string, Visitip string, Visitkeeptime Strin g) STORED by ' Org.apachE.hadoop.hive.hbase.hbasestoragehandler ' with serdeproperties ("hbase.columns.mapping" = ": Key, Info:firstaccessurl, Info:browserversion,info:firstaccesstime,info:operatesystem, Info:recentaccesstime,info:recentaccessurl,info: Screencolor,info:screensize,info:sitetype, Info:userflag,info:userprovince,info:visitcount,info:visitday,info: Visitflag,info:visithour, Info:visitip,info:visitkeeptime ") tblproperties (" hbase.table.name "=" uservisitinfo ");

4.3 Using hive statistical analysis


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.