0. Upload the log files to Linux and collect the files into HDFs via Flume.
Execute command/home/cloud/flume/bin/flume-ng agent-n a4-c conf-f/home/cloud/flume/conf/a4.conf-dflume.root.logger=debug, Console
1. Creating Hive Tables
Create external Table Bbslog (IP string,logtime string,url string) partitioned by (Logdate string) row format delimited fi ELDs terminated by ' \ t ' location '/cleaned ';
2. Create a shell script
Touch daily.sh
Add Execute Permissions
chmod +x daily.sh
Daily.sh:
current= ' Date +%y%m%d '
#对数据进行清理, save to cleaned folder, save as current date
/home/cloud/hadoop/bin/hadoop jar/home/cloud/cleaner.jar/flume/$CURRENT/cleaned/$CURRENT
#修改hive表, add a partition for the current date
/HOME/CLOUD/HIVE/BIN/HIVE-E "ALTER TABLE Bbslog Add partition (logdate= $CURRENT) location ' cleaned/$CURRENT '"
#使用hive进行分析, depending on business needs
#统计pv并计入每日的pv表
/home/cloud/hive/bin/hive-e "CREATE table pv_$current row format delimited fields terminated by ' \ t ' as SELECT COUNT (*) F Rom bbslog where logdate= $CURRENT; "
#统计点击次数过20的潜在用户
/HOME/CLOUD/HIVE/BIN/HIVE-E "CREATE TABLE VIP _$current row format delimited fields terminated by ' \ t ' as select $CURRENT , Ip,count (*) as hits from Bbslog where logdate= $CURRENT GROUP by IP have hits > order by hits DESC "
#查询uv
/home/cloud/hive/bin/hive-e "CREATE table uv_$current row format delimited fields terminated by ' \ t ' as SELECT COUNT (Dist Inct IP) from Bbslog where logdate= $CURRENT "
#查询每天的注册人数
/home/cloud/hive/bin/hive-e "CREATE table reg_$current row format delimited fields terminated by ' \ t ' as SELECT COUNT (*) From Bbslog where logdate= $CURRENT and InStr (URL, ' member.php?mod=register ') >0 "
#将hive表中的数据导入mysql
/home/cloud/sqoop/bin/sqoop export--connect jdbc:mysql://cloud3:3306/jchubby--username root--password JChubby123-- Export-dir "/user/hive/warehouse/vip_$current"--table vip--fields-terminated-by ' \ t '
Small Web site log analysis using the Hadoop platform