Send hive detail data with Python for mail

Source: Internet
Author: User

The code address is as follows:
Http://www.demodashi.com/demo/12673.html

I. Description of requirements

Customers need to receive specific activity data in Monday each week, generate Excel or CSV files, and send them to the designated recipients via email. The preliminary analysis of demand concludes that:

    • 1. The customer needs the data is not too complex, do not need to pass special processing, you can simply think of SQL query results output
    • 2. Query results output CSV file, and mail sending technology is relatively mature, strong versatility
    • 3.Linux System Crond Service supports timed tasks
Second, the system environmental requirements
    • Linux centos6.x
    • hadoop2.x
    • Python2.7
Third, Python relies on the library
    • Pyhive
    • Ppytools2
    • Thrift
    • Thrift-sasl
    • Sasl
Iv. Work Flow

Five, code implementation

The project structure diagram is as follows:

The key implementation steps are as follows:

1. Global configuration settings.py
 #-*-coding:utf-8-*-# __author__ = ' [email protected] ' from Ppytools.cfgreader import Confreaderimport Loggi Ng ' Log output format ' ' Logging.basicconfig (level=logging.info, encoding= ' UTF-8 ', format= '% (asctime) s [% (Levelnam e) s] {% (name) -10s}-% (message) s ') class Projectconfig (object): "" "" "" "" "" "" "" Projectconfig "" Def __init__ (self, *conf_path s): Self.cr = Confreader (*conf_paths) def gethiveconf (self): return self.cr.getValues (' Hiveserver ') de F Getemailserver (self): return self.cr.getValues (' Emailserver ') def getjobinfo (self): return Self.cr.getV Alues (' Jobinfo ') def getcsvfolder (self): return self.cr.getValues (' CSVFolder ') [' folder '] def getcsvfile (self) : Return self.cr.getValues (' CSVFile ') def getcsvhead (self): return self.cr.getValues (' csvhead ') def ge Thqlscript (self): return self.cr.getValues (' Hqlscript ') def getemailinfo (self): return self.cr.getValues ( ' EmailInfo ') 
2. Core code main.py
#-*-coding:utf-8-*-# __author__ = ' [email protected] ' from hiveemailjob.settings import Projectconfigfrom Ppytool S.csvhelper Import writefrom ppytools.emailclient import emailclientfrom ppytools.hiveclient import Hiveclientfrom Ppytools.lang.timerhelper import timemeterimport datetimeimport loggingimport sysimport timelogger = Logging.getlogger (__name__) def build_rk (ts): "" "Build HBase Row key value:p Aram Ts:date time:return:row Key" "" Retu RN Hex (int (Time.mktime (ts.timetuple ()) *1000)) [2:]def Email_att (folder, name): Return ' {}/{}_{}.csv '. Format (folder, Name, Datetime.datetime.now (). Strftime ('%y%m%d%h%m%s ')) @timeMeter () def run (args): "" "Email Job program Execute ENTR ance:p Aram args:1. Job file file path 2. Start time, format:2018-01-30 17:09:38 (not require) 3. Stop time (not require): Return:empty "" "" Read system args Start "Args_len = Len (args) if Arg S_len are not 2 and Args_len are not 4:loggeR.error (' Enter args is error.        Please check!!! ')        Logger.error (' 1:job file path. ') Logger.error (' 2:start time, format:2018-01-30 17:09:38 (option) ') Logger.error (' 3:stop time (option) ') sys.            Exit (1) elif args = = 4:try:start_time = Datetime.datetime.strptime (args[2], '%y-%m-%d%h:%m:%s ')  Stop_time = Datetime.datetime.strptime (args[3], '%y-%m-%d%h:%m:%s ') except Exception, e:raise RuntimeError (' Parse start or stop time failed!!! \ n ', e) else:stop_time = Datetime.date.today () start_time = Stop_time-datetime.timedelta (Days=1) Jo B_file = args[1] Start_rk = Build_rk (start_time) Stop_rk = Build_rk (stop_time) ' System settings files (hard cod e) ' hive_conf = '/etc/pythoncfg/hive.ini ' email_conf = '/etc/pythoncfg/email.ini ' sets = Projectconfig (hive _conf, email_conf, job_file) Job_info = Sets.getjobinfo () Csv_folder = Sets.getcsvfolder () logger.info (' Now Runni NG%s Email Job ... ', job_info[' title ']) logger.info (' Start Time:%s ', start_time) logger.info (' Stop Time:%s ', stop_t     IME) HC = Hiveclient (**sets.gethiveconf ()) Csv_file = Sets.getcsvfile (). Items () csv_file.sort () file_list = [] Logger.info (' File Name list: ') for (K, V) in Csv_file:logging.info ('%s:%s ', K, v) file_list.append    (v) csv_head = Sets.getcsvhead (). Items () csv_head.sort () head_list = [] logger.info (' CSV file head list: ') For (K, V) in Csv_head:logging.info ('%s:%s ', K, v) head_list.append (v) hql_scripts = Sets.gethqlscript (). Items () hql_scripts.sort () Email_atts = [] index = 0 for (k, hql) in Hql_scripts:logging.info ('%s:        %s ', K, hql) ' instance of your logic in here. ' result, size = Hc.execquery (Hql.format (Start_rk, STOP_RK)) if size is 0:logging.info (' the AB        Ove HQL script not found any data!!! ') Else:csv_file = EMAIl_att (Csv_folder, File_list[index]) email_atts.append (csv_file) write (Csv_file, HEAD_LIST[INDEX].SP    Lit (', '), result) Index + = 1 "Flush Hive Server connected. "' hc.closeconn () email_sub = Sets.getemailinfo () [' subject ']% start_time email_body = Sets.getemailinfo () [' Body    '] email_to = Sets.getemailinfo () [' to '].split (';')    EMAIL_CC = Sets.getemailinfo () [' CC '].split (';') If Len (email_atts) = = 0:email_body = ' Sorry, no data is currently found. \ nyou ' + email_body EC = emailclient (**sets.getemailserver ()) Ec.send (email_to, EMAIL_CC, Email_sub, Email_body, Emai L_atts, False) ec.quit () Logger.info (' finished%s Email Job. ', job_info[' title '])
3. System Configuration File hive.iniAnd email.ini
# /etc/pythoncfg/hive.ini[HiveServer]host=127.0.0.1port=10000user=hivedb=default# /etc/pythoncfg/email.ini[EmailServer]server=mail.163.comport=25[email protected]passwd=xxxxxxmode=TSL

Note: the above two files need to be configured under the specified directory /etc/pythoncfg/ .

4. Mail Job Configuration Reference emailjob.ini
[JobInfo]title=邮件报表任务测试[CSVFolder]folder=/opt/csv_files/# Please notice that CSVFile,CSVHead,HQLScript must be same length.# And suggest that use prefix+number to flag and write.[CSVFile]file1=省份分组统计file2=城市分组统计[CSVHead]head1=省份,累计head2=省份,城市,累计[HQLScript]script1=select cn_state,count(1) m from ext_act_ja1script2=select cn_state,cn_city,count(1) m from ext_act_ja2[EmailInfo][email protected];[email protected];# %s it will replace as the start date.subject=%s区域抽奖统计[测试]body=此邮件由系统自动发送,请勿回复,谢谢!

Note: The number of csvfile,csvhead,hqlscript to be consistent, also includes the order, it is recommended to use prefix + number format naming.

5.Bin file hive-emailjob.py
#! /usr/bin/env python# -*- coding: utf-8 -*-# __author__ = ‘[email protected]‘from hiveemailjob import mainimport sysif __name__ == ‘__main__‘:    main.run(sys.argv)
6. Effect of execution

In the system terminal, tap python-u bin/hive_email_job.py , the output is as follows:

2018-02-20 16:28:21,561 [INFO] {__main__}-now running Mail report task Test Email job ... 2018-02-20 16:28:21,561 [info] {__main__}-Start time:2018-02-222018-02-20 16:28:21,562 [info] {__main__}-Stop Tim e:2018-02-202018-02-20 16:28:21,691 [info] {pyhive.hive}-use ' Default ' 2018-02-20 16:28:21,731 [info] {ppytools.hive_ Client}-Hive server connect is ready. Transport open:true2018-02-20 16:28:31,957 [INFO] {ppytools.email_client}-email SMTP server Connect ready.2018-02-20 16 : 28:31,957 [info] {root}-File name list:2018-02-20 16:28:31,957 [info] {root}-file1: Province Group Statistics 2018-02-20 16:2 8:31,957 [info] {root}-file2: City Group Statistics 2018-02-20 16:28:31,957 [info] {root}-CSV file head list:2018-02-20 16 : 28:31,957 [info] {root}-head1: Province, Cumulative 2018-02-20 16:28:31,957 [info] {root}-head2: Province, City, cumulative 2018-02-20 16:28 : 31,957 [info] {root}-script1:select Cn_state,count (1) m from ext_act_ja22018-02-20 16:28:31,958 [info] {pyhive.h ive}-Select Cn_state,count (1) m from ext_act_ja22018-02-20 16:29:04,258 [INFO] {ppytools.hive_client}-Hive client query completed. Records found:312018-02-20 16:29:04,259 [INFO] {ppytools.lang.timer_helper}-Execute <ppytools.hive_  Client.execquery> method Cost 32.3012499809 seconds.2018-02-20 16:29:04,261 [INFO] {ppytools.csv_helper}-Write a CSV File successful. --/opt/csv_files/Province Group Statistics _20180223162904.csv2018-02-20 16:29:04,262 [INFO] {ppytools.lang.timer_helper}-Execute <ppytools.csv_helper.write> method Cost 0.00222992897034 seconds.2018-02-20 16:29:04,262 [INFO] {root}-scrip T2:select Cn_state,cn_city,count (1) m from ext_act_ja22018-02-20 16:29:04,262 [INFO] {pyhive.hive}-select Cn_state,cn_ City,count (1) m from ext_act_ja22018-02-20 16:29:23,462 [INFO] {ppytools.hive_client}-Hive client query completed. Records found:3672018-02-20 16:29:23,463 [INFO] {ppytools.lang.timer_helper}-Execute <ppytools.hive_ Client.execquery> method Cost 19.2005498409 seconds.2018-02-20 16: 29:23,465 [INFO] {ppytools.csv_helper}-Write a CSV file successful. --/opt/csv_files/City Group Statistics _20180223162923.csv2018-02-20 16:29:23,465 [INFO] {ppytools.lang.timer_helper}-Execute <ppytools.csv_helper.write> method Cost 0.00227284431458 seconds.2018-02-20 16:29:23,669 [INFO] {Ppytools.email _client}-Send Email[2018-02-22 regional Lottery statistics [test]] success. To users: [email protected]2018-02-20 16:29:23,669 [INFO] {ppytools.lang.timer_helper}-Execute < Ppytools.email_client.send> method Cost 0.204078912735 seconds.2018-02-20 16:29:23,714 [INFO] {__main__}-Finished Mail Piece report task test Email job.2018-02-20 16:29:23,715 [INFO] {ppytools.lang.timer_helper}-Execute <emailjob.main.run> Method cost 62.1566159725 seconds.

OK, a general-purpose data File synchronization program is now complete.
Send hive detail data with Python for mail

The code address is as follows:
Http://www.demodashi.com/demo/12673.html

Note: This copyright belongs to the author, by the demo master, refused to reprint, reprint need the author authorization

Send hive detail data with Python for mail

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.