User Behavior Analysis Service System log processing scheme

Source: Internet
Author: User
Tags json mongodb prepare stmt system log mysql database sqoop
1, the log to enter the target table structure information 1.1 The target table structure information on the Hive library
CREATE TABLE ' Yemao_log ' (
  ' id ' int, 
  ' time ' int, ' 
  url_from ' string, ' 
  url_current ' string, 
  ' url_to ' String, 
  ' options ' string, 
  ' uid ' int, 
  ' new_visitor ' string, 
  ' Province ' string, 
  ' city ' string, 
  ' site ' string, 
  ' device ' string, 
  ' phone ' string, ' 
  token ' string, ' 
  dorm ' string, 
  ' Order_phone ' string, 
  ' Order_dormitory ' string, 
  ' Order_amount ' string, 
  ' order_id ' int, 
  ' uname ' String, 
  ' site_id ' int, 
  ' address ' string, 
  ' dorm_id ' int, 
  ' dormentry_id ' int, 
  ' rid ' int, 
  ' cart_quantity ' string)
Partitioned by ( 
  ' log_date ' int.)
ROW FORMAT delimited fields 
  TERMINATED by ', ' 
  LINES TERMINATED by ' \ N ' 
STORED as InputFormat 
  ' Org.apache.hadoop.mapred.TextInputFormat ' 
outputformat 
  ' Org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat '
location
  ' hdfs://master:9000/user/hive/ Warehouse/yemao_log '
tblproperties (
  ' transient_lastddltime ' = ' 1447308813 ');

1.2 The current table on the MySQL library is actually a temporary table
CREATE TABLE ' Yemao_log ' (
  ' id ' varchar (8000) default NULL,
  ' time ' varchar (8000) default NULL,
  ' Url_from ' Text,
  ' url_current ' text,
  ' url_to ' text,
  ' options ' text,
  ' uid ' text,
  ' new_visitor ' text,
  ' Province ' text,
  ' city ' text,
  ' site ' text,
  ' device ' text,
  ' phone ' text, '
  token ' text,
  ' Dorm ' text,
  ' order_phone ' text,
  ' order_dormitory ' text,
  ' order_amount ' text,
  ' order_id ' text,
  ' uname ' text,
  ' site_id ' text,
  ' address ' text,
  ' dorm_id ' text,
  ' dormentry_id ' text,
  ' Rid ' text,
  ' cart_quantity ' text
) Engine=innodb DEFAULT Charset=utf8;

1.3 MySQL Library on the history table, actually is actually in use of the table
CREATE TABLE ' yemao_loghis ' (' ID ' varchar (8000) default NULL, ' time ' varchar (8000) default NULL, ' Url_from ' text, ' Url_current ' text, ' url_to ' text, ' Options ' text, ' uid ' text, ' new_visitor ' text, ' Province ' text, ' City ' Te XT, ' site ' text, ' device ' text, ' phone ' text, ' token ' text, ' dorm ' text, ' order_phone ' text, ' Order_dormitor
  Y ' text, ' order_amount ' text, ' order_id ' text, ' uname ' text, ' site_id ' text, ' address ' text, ' dorm_id ' text, ' dormentry_id ' text, ' rid ' text, ' cart_quantity ' text, ' log_date ' int (one-by-one) default NULL) Engine=innodb default CHAR  Set=utf8/*!50100 PARTITION by LIST (log_date) (PARTITION p0 values in (0) ENGINE = InnoDB, PARTITION p20151109 values in (20151109) engine = InnoDB, PARTITION p20151110 values in (20151110) engine = InnoDB, PARTITION p20151111 VALUES in (20 151111) engine = InnoDB, PARTITION p20151112 values in (20151112) engine = InnoDB, PARTITION p20151113 values in (201511 ENGINE = InnoDB, PARtition p20151114 values in (20151114) engine = InnoDB, PARTITION p20151115 values in (20151115) engine = InnoDB, Partit ION p20151116 values in (20151116) engine = InnoDB, PARTITION p20151117 values in (20151117) engine = InnoDB, PARTITION p20151118 values in (20151118) engine = InnoDB, PARTITION p20151119 VALUES in (20151119) engine = InnoDB, PARTITION p201 51120 values in (20151120) engine = InnoDB, PARTITION p20151121 VALUES in (20151121) engine = InnoDB, PARTITION p2015112 2 values in (20151122) engine = InnoDB, PARTITION p20151123 VALUES in (20151123) engine = InnoDB, PARTITION p20151124 VA  Lues in (20151124) engine = InnoDB, PARTITION p20151125 values in (20151125) engine = InnoDB, PARTITION p20151126 values In (20151126) engine = InnoDB, PARTITION p20151127 values in (20151127) engine = InnoDB, PARTITION p20151128 values in (20151128) engine = InnoDB, PARTITION p20151129 values in (20151129) engine = InnoDB, PARTITION p20151130 VALUES in (201
51130) ENGINE = InnoDB, PARTITION p20151201 values in (20151201) engine = InnoDB, PARTITION p20151202 values in (20151202) engine = InnoDB, PAR
 Tition p20151203 VALUES in (20151203) ENGINE = InnoDB) */;

2. Stored procedures for data processing
CREATE definer= ' datahs ' @ '% ' PROCEDURE ' p_ymlog_maintain ' (in ' v_log_date ' int) BEGIN DECLARE v_partition_exists int; SELECT count (1) into v_partition_exists from INFORMATION_SCHEMA. ' Partitions ' WHERE table_schema = ' logdata ' and Table_n

Ame = ' yemao_loghis ' and partition_name = concat (' P ', v_log_date);
IF v_partition_exists = 1 then SET @exec_sql =concat ("ALTER TABLE logdata.yemao_loghis DROP partition P", v_log_date); 
PREPARE stmt from @exec_sql; 
EXECUTE stmt;


END IF; SET @exec_sql =concat ("ALTER TABLE logdata.yemao_loghis ADD PARTITION (PARTITION P", V_log_date, "VALUES in (", V_log_date,
"));"); 
PREPARE stmt from @exec_sql; 

EXECUTE stmt; SET @exec_sql =concat ("INSERT into Logdata.yemao_loghis (ID, time, Url_from, Url_current, url_to, OPTIONS, uid, NE W_visitor, Province, city, site, device, phone, token, dorm, Order_phone, Order_dormitory, Order_amount, Order_ ID, uname, site_id, Address, dorm_id, dormentry_id, RID, cart_quantity, log_date) SELECT A.ID, A.time, a.url_from, A.url_current, A.url_to, A. OPTIONS, A.uid, A.new_visitor, A.province, a.city, A.site, A.device, A.phone, A.token, A.dorm, A.order_phone, A.order_dormitory, A.order_amount, a.order_id, A.uname, A.site _id, A.address, a.dorm_id, a.dormentry_id, A.rid, A.cart_quantity, ", V_log_date," Log_date from Logdata.yemao_log
A WHERE ID <> ' id '; "); 
PREPARE stmt from @exec_sql; 

EXECUTE stmt;

TRUNCATE TABLE Logdata.yemao_log; END

3. Shell scripts for data processing and loading
Timed Automatic dispatch script
ymlog_proc.sh
#/bin/bash export yesterday= ' date-d last-day +%y%m%d ' Cd/home/spark/opt/log_data/yemao for tar in yemao* $yesterday. Tar. 
Gz
Do tar xvf $tar;
Grep-h "\[{.*}\" *.log >> yemaolog; Rm-rf/home/spark/opt/log_data/yemao/*.log done sed-i ' s/^.//' yemaolog sed-i ' s/. $//' yemaolog/home/spark/opt/mongodb-2.7.0/bin/mongoimport-d yemao-c yemao_log_$yesterday--drop/home/spark/opt/ log_data/yemao/yemaolog/home/spark/opt/mongodb-2.7.0/bin/mongoexport-d yemao-c yemao_log_$yesterday--csv-f ID, Time,url_from,url_current,url_to,options,uid,new_visitor,province,city,site,device,phone,token,dorm,order_ Phone,order_dormitory,order_amount,order_id,uname,site_id,address,dorm_id,dormentry_id,rid,cart_quantity-o/ HOME/SPARK/OPT/LOG_DATA/YEMAO/YEMAO.DAT/HOME/SPARK/OPT/HIVE-1.2.1/BIN/HIVE-E "ALTER TABLE yemao_log drop if exists Partition (log_date= $yesterday); ALTER TABLE Yemao_log add if not exists partition (log_date= $yesterday); load data Local in Path '/home/spark/opt/log_data/yemao/yemAo.dat ' into table Yemao_log partition (log_date= $yesterday); "#/usr/local/mysql/bin/mysqlimport-h120.55.189.188- udatawarehouse-pdatawarehouse2015--fields-terminated-by=,--lines-terminated-by=\n logdata/home/spark/opt/log_ Data/yemao/yemao.dat--local/home/spark/opt/sqoop-1.4.6/bin/sqoop Export--connect jdbc:mysql:// 120.55.189.188:3306/logdata--username datawarehouse--password datawarehouse2015--table yemao_log--export-dir/user
/hive/warehouse/yemao_log/log_date= $yesterday--input-fields-terminated-by ', '; /usr/local/mysql/bin/mysql-h120.55.189.188-udatawarehouse-pdatawarehouse2015-e "Call Logdata.p_ymlog_maintain ($ Yesterday); "Rm-rf/home/spark/opt/log_data/yemao/yemao.dat Rm-rf/home/spark/opt/log_data/yemao/yemaolog RM-RF/

 Home/spark/opt/log_data/yemao/yemao_log.java

manual scheduling of processing scriptsymlog_proc_manual.sh
#/bin/bash #export yesterday= ' date-d last-day +%y%m%d ' echo-n ' Please enter a day for runing: "Read yesterday Cd/home 
/spark/opt/log_data/yemao for tar in yemao* $yesterday. tar.gz;
Do tar xvf $tar;
Grep-h "\[{.*}\" *.log >> yemaolog; Rm-rf/home/spark/opt/log_data/yemao/*.log done sed-i ' s/^.//' yemaolog sed-i ' s/. $//' yemaolog/home/spark/opt/mongodb-2.7.0/bin/mongoimport-d yemao-c yemao_log_$yesterday--drop/home/spark/opt/ log_data/yemao/yemaolog/home/spark/opt/mongodb-2.7.0/bin/mongoexport-d yemao-c yemao_log_$yesterday--csv-f ID, Time,url_from,url_current,url_to,options,uid,new_visitor,province,city,site,device,phone,token,dorm,order_ Phone,order_dormitory,order_amount,order_id,uname,site_id,address,dorm_id,dormentry_id,rid,cart_quantity-o/ HOME/SPARK/OPT/LOG_DATA/YEMAO/YEMAO.DAT/HOME/SPARK/OPT/HIVE-1.2.1/BIN/HIVE-E "ALTER TABLE yemao_log drop if exists Partition (log_date= $yesterday); ALTER TABLE Yemao_log add if not exists partition (log_date= $yesterday);Load data local inpath '/home/spark/opt/log_data/yemao/yemao.dat ' into table Yemao_log partition (log_date= $yesterday); " #/usr/local/mysql/bin/mysqlimport-h120.55.189.188-udatawarehouse-pdatawarehouse2015--fields-terminated-by=,-- lines-terminated-by=\n Logdata/home/spark/opt/log_data/yemao/yemao.dat--local/home/spark/opt/sqoop-1.4.6/bin/ Sqoop export--connect jdbc:mysql://120.55.189.188:3306/logdata--username datawarehouse--password datawarehouse2015
--table yemao_log--export-dir/user/hive/warehouse/yemao_log/log_date= $yesterday--input-fields-terminated-by ', '; /usr/local/mysql/bin/mysql-h120.55.189.188-udatawarehouse-pdatawarehouse2015-e "Call Logdata.p_ymlog_maintain ($ Yesterday); "Rm-rf/home/spark/opt/log_data/yemao/yemao.dat Rm-rf/home/spark/opt/log_data/yemao/yemaolog RM-RF/

 Home/spark/opt/log_data/yemao/yemao_log.java

4, set corntab timing scheduling
[Spark@master ~]$ Crontab-l
0 6 * * * sh/home/spark/opt/log_data/ymlog_proc.sh
5. Processing Process Description
The user behavior data generated by the embedded point of the business system is stored and transmitted in JSON mode. First, the source log data is processed to a standard JSON format, then the files are loaded into the MongoDB database, and the necessary fields are loaded into hive and MySQL database as needed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.