The code address is as follows:
Http://www.demodashi.com/demo/12673.html
I. Description of requirements
Customers need to receive specific activity data in Monday each week, generate Excel or CSV files, and send them to the designated recipients via email. The preliminary analysis of demand concludes that:
- 1. The customer needs the data is not too complex, do not need to pass special processing, you can simply think of SQL query results output
- 2. Query results output CSV file, and mail sending technology is relatively mature, strong versatility
- 3.Linux System Crond Service supports timed tasks
Second, the system environmental requirements
- Linux centos6.x
- hadoop2.x
- Python2.7
Third, Python relies on the library
- Pyhive
- Ppytools2
- Thrift
- Thrift-sasl
- Sasl
Iv. Work Flow
Five, code implementation
The project structure diagram is as follows:
The key implementation steps are as follows:
1. Global configuration
settings.py
#-*-coding:utf-8-*-# __author__ = ' [email protected] ' from Ppytools.cfgreader import Confreaderimport Loggi Ng ' Log output format ' ' Logging.basicconfig (level=logging.info, encoding= ' UTF-8 ', format= '% (asctime) s [% (Levelnam e) s] {% (name) -10s}-% (message) s ') class Projectconfig (object): "" "" "" "" "" "" "" Projectconfig "" Def __init__ (self, *conf_path s): Self.cr = Confreader (*conf_paths) def gethiveconf (self): return self.cr.getValues (' Hiveserver ') de F Getemailserver (self): return self.cr.getValues (' Emailserver ') def getjobinfo (self): return Self.cr.getV Alues (' Jobinfo ') def getcsvfolder (self): return self.cr.getValues (' CSVFolder ') [' folder '] def getcsvfile (self) : Return self.cr.getValues (' CSVFile ') def getcsvhead (self): return self.cr.getValues (' csvhead ') def ge Thqlscript (self): return self.cr.getValues (' Hqlscript ') def getemailinfo (self): return self.cr.getValues ( ' EmailInfo ')
2. Core code
main.py
#-*-coding:utf-8-*-# __author__ = ' [email protected] ' from hiveemailjob.settings import Projectconfigfrom Ppytool S.csvhelper Import writefrom ppytools.emailclient import emailclientfrom ppytools.hiveclient import Hiveclientfrom Ppytools.lang.timerhelper import timemeterimport datetimeimport loggingimport sysimport timelogger = Logging.getlogger (__name__) def build_rk (ts): "" "Build HBase Row key value:p Aram Ts:date time:return:row Key" "" Retu RN Hex (int (Time.mktime (ts.timetuple ()) *1000)) [2:]def Email_att (folder, name): Return ' {}/{}_{}.csv '. Format (folder, Name, Datetime.datetime.now (). Strftime ('%y%m%d%h%m%s ')) @timeMeter () def run (args): "" "Email Job program Execute ENTR ance:p Aram args:1. Job file file path 2. Start time, format:2018-01-30 17:09:38 (not require) 3. Stop time (not require): Return:empty "" "" Read system args Start "Args_len = Len (args) if Arg S_len are not 2 and Args_len are not 4:loggeR.error (' Enter args is error. Please check!!! ') Logger.error (' 1:job file path. ') Logger.error (' 2:start time, format:2018-01-30 17:09:38 (option) ') Logger.error (' 3:stop time (option) ') sys. Exit (1) elif args = = 4:try:start_time = Datetime.datetime.strptime (args[2], '%y-%m-%d%h:%m:%s ') Stop_time = Datetime.datetime.strptime (args[3], '%y-%m-%d%h:%m:%s ') except Exception, e:raise RuntimeError (' Parse start or stop time failed!!! \ n ', e) else:stop_time = Datetime.date.today () start_time = Stop_time-datetime.timedelta (Days=1) Jo B_file = args[1] Start_rk = Build_rk (start_time) Stop_rk = Build_rk (stop_time) ' System settings files (hard cod e) ' hive_conf = '/etc/pythoncfg/hive.ini ' email_conf = '/etc/pythoncfg/email.ini ' sets = Projectconfig (hive _conf, email_conf, job_file) Job_info = Sets.getjobinfo () Csv_folder = Sets.getcsvfolder () logger.info (' Now Runni NG%s Email Job ... ', job_info[' title ']) logger.info (' Start Time:%s ', start_time) logger.info (' Stop Time:%s ', stop_t IME) HC = Hiveclient (**sets.gethiveconf ()) Csv_file = Sets.getcsvfile (). Items () csv_file.sort () file_list = [] Logger.info (' File Name list: ') for (K, V) in Csv_file:logging.info ('%s:%s ', K, v) file_list.append (v) csv_head = Sets.getcsvhead (). Items () csv_head.sort () head_list = [] logger.info (' CSV file head list: ') For (K, V) in Csv_head:logging.info ('%s:%s ', K, v) head_list.append (v) hql_scripts = Sets.gethqlscript (). Items () hql_scripts.sort () Email_atts = [] index = 0 for (k, hql) in Hql_scripts:logging.info ('%s: %s ', K, hql) ' instance of your logic in here. ' result, size = Hc.execquery (Hql.format (Start_rk, STOP_RK)) if size is 0:logging.info (' the AB Ove HQL script not found any data!!! ') Else:csv_file = EMAIl_att (Csv_folder, File_list[index]) email_atts.append (csv_file) write (Csv_file, HEAD_LIST[INDEX].SP Lit (', '), result) Index + = 1 "Flush Hive Server connected. "' hc.closeconn () email_sub = Sets.getemailinfo () [' subject ']% start_time email_body = Sets.getemailinfo () [' Body '] email_to = Sets.getemailinfo () [' to '].split (';') EMAIL_CC = Sets.getemailinfo () [' CC '].split (';') If Len (email_atts) = = 0:email_body = ' Sorry, no data is currently found. \ nyou ' + email_body EC = emailclient (**sets.getemailserver ()) Ec.send (email_to, EMAIL_CC, Email_sub, Email_body, Emai L_atts, False) ec.quit () Logger.info (' finished%s Email Job. ', job_info[' title '])
3. System Configuration File
hive.ini
And
email.ini
# /etc/pythoncfg/hive.ini[HiveServer]host=127.0.0.1port=10000user=hivedb=default# /etc/pythoncfg/email.ini[EmailServer]server=mail.163.comport=25[email protected]passwd=xxxxxxmode=TSL
Note: the above two files need to be configured under the specified directory /etc/pythoncfg/
.
4. Mail Job Configuration Reference
emailjob.ini
[JobInfo]title=邮件报表任务测试[CSVFolder]folder=/opt/csv_files/# Please notice that CSVFile,CSVHead,HQLScript must be same length.# And suggest that use prefix+number to flag and write.[CSVFile]file1=省份分组统计file2=城市分组统计[CSVHead]head1=省份,累计head2=省份,城市,累计[HQLScript]script1=select cn_state,count(1) m from ext_act_ja1script2=select cn_state,cn_city,count(1) m from ext_act_ja2[EmailInfo][email protected];[email protected];# %s it will replace as the start date.subject=%s区域抽奖统计[测试]body=此邮件由系统自动发送,请勿回复,谢谢!
Note: The number of csvfile,csvhead,hqlscript to be consistent, also includes the order, it is recommended to use prefix + number format naming.
5.Bin file
hive-emailjob.py
#! /usr/bin/env python# -*- coding: utf-8 -*-# __author__ = ‘[email protected]‘from hiveemailjob import mainimport sysif __name__ == ‘__main__‘: main.run(sys.argv)
6. Effect of execution
In the system terminal, tap python-u bin/hive_email_job.py
, the output is as follows:
2018-02-20 16:28:21,561 [INFO] {__main__}-now running Mail report task Test Email job ... 2018-02-20 16:28:21,561 [info] {__main__}-Start time:2018-02-222018-02-20 16:28:21,562 [info] {__main__}-Stop Tim e:2018-02-202018-02-20 16:28:21,691 [info] {pyhive.hive}-use ' Default ' 2018-02-20 16:28:21,731 [info] {ppytools.hive_ Client}-Hive server connect is ready. Transport open:true2018-02-20 16:28:31,957 [INFO] {ppytools.email_client}-email SMTP server Connect ready.2018-02-20 16 : 28:31,957 [info] {root}-File name list:2018-02-20 16:28:31,957 [info] {root}-file1: Province Group Statistics 2018-02-20 16:2 8:31,957 [info] {root}-file2: City Group Statistics 2018-02-20 16:28:31,957 [info] {root}-CSV file head list:2018-02-20 16 : 28:31,957 [info] {root}-head1: Province, Cumulative 2018-02-20 16:28:31,957 [info] {root}-head2: Province, City, cumulative 2018-02-20 16:28 : 31,957 [info] {root}-script1:select Cn_state,count (1) m from ext_act_ja22018-02-20 16:28:31,958 [info] {pyhive.h ive}-Select Cn_state,count (1) m from ext_act_ja22018-02-20 16:29:04,258 [INFO] {ppytools.hive_client}-Hive client query completed. Records found:312018-02-20 16:29:04,259 [INFO] {ppytools.lang.timer_helper}-Execute <ppytools.hive_ Client.execquery> method Cost 32.3012499809 seconds.2018-02-20 16:29:04,261 [INFO] {ppytools.csv_helper}-Write a CSV File successful. --/opt/csv_files/Province Group Statistics _20180223162904.csv2018-02-20 16:29:04,262 [INFO] {ppytools.lang.timer_helper}-Execute <ppytools.csv_helper.write> method Cost 0.00222992897034 seconds.2018-02-20 16:29:04,262 [INFO] {root}-scrip T2:select Cn_state,cn_city,count (1) m from ext_act_ja22018-02-20 16:29:04,262 [INFO] {pyhive.hive}-select Cn_state,cn_ City,count (1) m from ext_act_ja22018-02-20 16:29:23,462 [INFO] {ppytools.hive_client}-Hive client query completed. Records found:3672018-02-20 16:29:23,463 [INFO] {ppytools.lang.timer_helper}-Execute <ppytools.hive_ Client.execquery> method Cost 19.2005498409 seconds.2018-02-20 16: 29:23,465 [INFO] {ppytools.csv_helper}-Write a CSV file successful. --/opt/csv_files/City Group Statistics _20180223162923.csv2018-02-20 16:29:23,465 [INFO] {ppytools.lang.timer_helper}-Execute <ppytools.csv_helper.write> method Cost 0.00227284431458 seconds.2018-02-20 16:29:23,669 [INFO] {Ppytools.email _client}-Send Email[2018-02-22 regional Lottery statistics [test]] success. To users: [email protected]2018-02-20 16:29:23,669 [INFO] {ppytools.lang.timer_helper}-Execute < Ppytools.email_client.send> method Cost 0.204078912735 seconds.2018-02-20 16:29:23,714 [INFO] {__main__}-Finished Mail Piece report task test Email job.2018-02-20 16:29:23,715 [INFO] {ppytools.lang.timer_helper}-Execute <emailjob.main.run> Method cost 62.1566159725 seconds.
OK, a general-purpose data File synchronization program is now complete.
Send hive detail data with Python for mail
The code address is as follows:
Http://www.demodashi.com/demo/12673.html
Note: This copyright belongs to the author, by the demo master, refused to reprint, reprint need the author authorization
Send hive detail data with Python for mail