background:Data from e-commerce website user behavior data. Analyze processing for portal access logs.
Technical solution: Use hadoop+hive offline processing log, generate PV and UV results, statistical analysis of user behavior log format
"06/jul/2015:00:01:04 +0800" "GET" "http%3a//jf.10086.cn/m/" "http/1.1" "$" "http://jf.10086.cn/m/subject/ 100000000000009_0.html "" mozilla/5.0 (Linux; U Android 4.4.2; ZH-CN; Lenovo a3800-d build/lenovoa3800-d) applewebkit/533.1 (khtml, like Gecko) version/4.0 mqqbrowser/5.4 tbs/025438 Mobile safari/533.1 micromessenger/6.2.0.70_r1180778.561 nettype/cmnet language/zh_cn "" 10.139.198.176 "" 480x854 "" 24 ""% U5927%u7c7b%u5217%u8868%u9875_%u4e2d%u56fd%u79fb%u52a8%u79ef%u5206%u5546%u57ce "" 0 "" 3037487029517069460000 "" 3037487029517069460000 "" "1" "06/jul/2015:01:01:04" "+0800" "GET" "http%3a//jf.10086.cn/portal/ware/web/ searchwareaction%3faction%3dsearchwareinfo%26pager.offset%3d144 "" http/1.1 "" "" Http://jf.10086.cn/portal/ware " /web/searchwareaction?action=searchwareinfo&pager.offset=156 "" mozilla/5.0 (Linux; U Android 4.4.2; ZH-CN; HUAWEI mt2-l01 build/huaweimt2-l01) applewebkit/534.30 (khtml, like Gecko) version/4.0 ucbrowser/10.5.2.598 U3/0.8.0 Mobile safari/534.30 "" 223.73.104.224 "" 720x1208 "" + ""%u641c%u7d22_%u4e2d%u56fd%u79fb%u52a8%u79ef%u5206%u5546%u57ce "" 0 "" 3046252153674140570000 " "3046252153674140570000" "1" "2699" "06/jul/2015:02:01:04 +0800" "GET" "" "http/1.1" "" "" http://jf.10086.cn/"" mozilla/5.0 (Linux; Android 4.4.4; Vivo y13l build/ktu84p) applewebkit/537.36 (khtml, like Gecko) version/4.0 chrome/33.0.0.0 Mobile safari/537.36 baiduboxa pp/5.1 (Baidu; P1 4.4.4) "" 10.154.210.240 "" 480x855 "" + ""%u9996%u9875_%u4e2d%u56fd%u79fb%u52a8%u79ef%u5206%u5546%u57ce "" 0 "" 3098781670304015290000 "" 3098781670304015290000 "" 0 "" 831 "" 06/jul/2015:03:01:07 +0800 "" GET "" http%3a//wx.10086.cn/ Wechat-website/wechatwebsite/accumulatepoints "" http/1.1 "" "" "" http://jf.10086.cn/m/"" mozilla/5.0 (Linux; U Android 4.4.2; ZH-CN; Lenovo a3800-d build/lenovoa3800-d) applewebkit/533.1 (khtml, like Gecko) version/4.0 mqqbrowser/5.4 tbs/025438 Mobile safari/533.1 micromessenger/6.2.0.70_r1180778.561 nettype/cmnet language/zh_cn "" 10.139.198.176 "" 480x854 "" 24 ""% u9996%U9875_%u4e2d%u56fd%u79fb%u52a8%u79ef%u5206%u5546%u57ce "" 0 "" 3037487029517069460000 "" 3037487029517069460000 "" 1 ""
135 "
Data sources, you can refer to the following website
http://jf.10086.cn/analyzeVesopera.gif?screenSize=1366x768&screenColor=24&pageTitle=%u9996%u9875_% u4e2d%u56fd%u79fb%u52a8%u79ef%u5206%u5546%u57ce&referrerpage=&sitetype=0&uid=20523849176242946000 &sid=56080848979763680000&sflag=1&countlog=1443006061700&onloadtotaltime=135
Technical steps:1, build Hadoop cluster, offline log file batch processing
for the installation of Hadoop clusters, please refer to:http://blog.csdn.net/shenfuli/article/category/2803453
for the installation of Hive, please refer to:http://blog.csdn.net/shenfuli/article/category/5017631
For more information about HBase please refer to:http://blog.csdn.net/shenfuli/article/category/5570409
2, through the MapReduce program to enhance the log
3. Form business data through hive scripts
4. Presenting data through a Web application