[Author]: Kwu
Automated scripts to import hive Data Warehouse on a daily schedule
Create shell scripts, create temporary tables, load data, and convert to a formal partition table:
#!/bin/sh# upload logs to hdfsyesterday= ' date--date= ' 1 days ago ' +%y%m%d ' hive-e ' use stage;create table tracklog_tmp (DA Teday string,datetime string,ip string, Cookieid string,userid string,logserverip string,referer string,requesturl String,remark1 string,remark2 string,alexaflag string,ua string,wirelessflag string) ROW FORMAT delimited fields TERMINATED by ";" Hive-e "Use Stage;set hive.enforce.bucketing=true;set hive.exec.compress.output=true;set mapred.output.compress= True;set Mapred.output.compression.codec=org.apache.hadoop.io.compress.gzipcodec;set io.compression.codecs= Org.apache.hadoop.io.compress.gzipcodec;load data local Inpath '/diskg/logs/tracklog_192.168.1.1/${yesterday}/${ Yesterday}????. DAT ' overwrite into table Tracklog_tmp;insert to table Tracklog PARTITION (day= ' ${yesterday} ') select * from Tracklog _tmp;load data local inpath '/diskg/logs/tracklog_192.168.1.2/${yesterday}/${yesterday}????. DAT ' overwrite into table tracklog_tmp;insert into table tracklog partition (day= ' ${yesterday} ') select * from tracklog_tmp;load data local inpath '/diskg/logs/tracklog_192.168.1.3/${yesterd Ay}/${yesterday}????. DAT ' overwrite into table Tracklog_tmp;insert to table Tracklog PARTITION (day= ' ${yesterday} ') select * from Tracklog _tmp;load data local inpath '/diskg/logs/trackloguc_192.168.1.1/${yesterday}/${yesterday}????. DAT ' overwrite into table Tracklog_tmp;insert to table Tracklog PARTITION (day= ' ${yesterday} ') select * from Tracklog _tmp;load data local inpath '/diskg/logs/trackloguc_192.168.1.2/${yesterday}/${yesterday}????. DAT ' overwrite into table Tracklog_tmp;insert to table Tracklog PARTITION (day= ' ${yesterday} ') select * from Tracklog _tmp;load data local inpath '/diskg/logs/trackloguc_192.168.1.3/${yesterday}/${yesterday}????. DAT ' overwrite into table Tracklog_tmp;insert to table Tracklog PARTITION (day= ' ${yesterday} ') select * from Tracklog _tmp; " HIVE-E "Use Stage;drop table tracklog_tmp;"
Join a timed task in crontab
Crontab-e
Add the following code
#import Tracklog
* * * * */opt/bin/hive_opt/import_tracklog.sh
Automated scripts to import hive Data Warehouse on a daily schedule