Small Web site log analysis using the Hadoop platform

Source: Internet
Author: User
Tags sqoop


0. Upload the log files to Linux and collect the files into HDFs via Flume.
Execute command/home/cloud/flume/bin/flume-ng agent-n a4-c conf-f/home/cloud/flume/conf/a4.conf-dflume.root.logger=debug, Console


1. Creating Hive Tables
Create external Table Bbslog (IP string,logtime string,url string) partitioned by (Logdate string) row format delimited fi ELDs terminated by ' \ t ' location '/cleaned ';


2. Create a shell script
Touch daily.sh
Add Execute Permissions
chmod +x daily.sh
Daily.sh:
current= ' Date +%y%m%d '
#对数据进行清理, save to cleaned folder, save as current date
/home/cloud/hadoop/bin/hadoop jar/home/cloud/cleaner.jar/flume/$CURRENT/cleaned/$CURRENT
#修改hive表, add a partition for the current date
/HOME/CLOUD/HIVE/BIN/HIVE-E "ALTER TABLE Bbslog Add partition (logdate= $CURRENT) location ' cleaned/$CURRENT '"
#使用hive进行分析, depending on business needs
#统计pv并计入每日的pv表
/home/cloud/hive/bin/hive-e "CREATE table pv_$current row format delimited fields terminated by ' \ t ' as SELECT COUNT (*) F Rom bbslog where logdate= $CURRENT; "
#统计点击次数过20的潜在用户
/HOME/CLOUD/HIVE/BIN/HIVE-E "CREATE TABLE VIP _$current row format delimited fields terminated by ' \ t ' as select $CURRENT , Ip,count (*) as hits from Bbslog where logdate= $CURRENT GROUP by IP have hits > order by hits DESC "
#查询uv
/home/cloud/hive/bin/hive-e "CREATE table uv_$current row format delimited fields terminated by ' \ t ' as SELECT COUNT (Dist Inct IP) from Bbslog where logdate= $CURRENT "
#查询每天的注册人数
/home/cloud/hive/bin/hive-e "CREATE table reg_$current row format delimited fields terminated by ' \ t ' as SELECT COUNT (*) From Bbslog where logdate= $CURRENT and InStr (URL, ' member.php?mod=register ') >0 "
#将hive表中的数据导入mysql
/home/cloud/sqoop/bin/sqoop export--connect jdbc:mysql://cloud3:3306/jchubby--username root--password JChubby123-- Export-dir "/user/hive/warehouse/vip_$current"--table vip--fields-terminated-by ' \ t '

Small Web site log analysis using the Hadoop platform

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.