Python script implementation download merge SAE Log _python

Source: Internet
Author: User
Tags md5 python script

For some reasons, need SAE on the site log files, from the SAE can only be downloaded, download down manual processing compared to egg pain, especially when the number is very large. Fortunately, the SAE provides API to bulk get log file download addresses, just write a python script to automatically download and merge these files

Call API get download Address

The document location is here

Set your own application and download parameters

The variables to be set in the request are as follows

Copy Code code as follows:

Api_url = ' http://dloadcenter.sae.sina.com.cn/interapi.php? '
AppName = ' xxxxx '
from_date = ' 20140101 '
to_date = ' 20140116 '
Url_type = ' http ' # HTTP|TASKQUEUE|CRON|MAIL|RDC
url_type2 = ' Access ' # when Type=http access|debug|error|warning|notice|resources
Secret_key = ' xxxxx '

Generate Request Address

Request Address generation method can look at the official website requirements:

1. Sort the parameters
2. Generate request string, remove &
3. Additional Access_key
4. Request string MD5, form sign
5. Add sign to the request string

The actual implementation code is as follows

Copy Code code as follows:

params = Dict ()
params[' act ' = ' log '
params[' appname '] = AppName
Params[' from '] = From_date
Params[' to '] = To_date
params[' type '] = Url_type

if Url_type = = ' http ':
params[' type2 '] = url_type2

params = collections. Ordereddict (Sorted (Params.items ()))

Request = '
For k,v in Params.iteritems ():
Request + + ' = ' +v+ ' & '

Sign = Request.replace (' & ', ')
Sign + + Secret_key

MD5 = HASHLIB.MD5 ()
Md5.update (sign)
Sign = Md5.hexdigest ()

Request = Api_url + request + ' sign= ' + sign

If response[' errno ']!= 0:
print ' [!] ' +response[' errmsg ']
Exit ()

print ' [#] Request Success '

Download log files

The SAE packs the daily log files into tar.gz format, the download is saved, and the filename is named after the date. tar.gz

Copy Code code as follows:

Log_files = List ()

For Down_url in response[' data ']:
file_name = Re.compile (R ' \d{4}-\d{2}-\d{2} '). FindAll (Down_url) [0] + '. tar.gz '
Log_files.append (file_name)
data = Urllib2.urlopen (Down_url). Read ()
With open (file_name, "WB") as File:
File.write (data)

print ' [#] You got%d log files '% len (log_files)

Merging files

Merging files using Trafile libraries to extract each file, and then attach the contents of the file to Access_log.

Copy Code code as follows:

# compress these files to Access_log
Access_log = open (' Access_log ', ' W ');

For Log_file in Log_files:
tar = Tarfile.open (log_file)
Log_name = Tar.getnames () [0]
Tar.extract (Log_name)
# Save to Access_log
data = open (Log_name). Read ()
Access_log.write (data)
Os.remove (Log_name)

print ' [#] all file has writen to Access_log '

Complete code

Copy Code code as follows:

#!/usr/bin/env python
#-*-Coding:utf-8-*-
# @Author: Su Yan # @Date: 2014-01-17 12:05:19
# @Last Modified By:su Yan
# @Last Modified time:2014-01-17 14:15:41

Import OS
Import Collections
Import Hashlib
Import Urllib2
Import JSON
Import re
Import Tarfile

# settings
# documents http://sae.sina.com.cn/?m=devcenter&catId=281
Api_url = ' http://dloadcenter.sae.sina.com.cn/interapi.php? '
AppName = ' Yansublog '
from_date = ' 20140101 '
to_date = ' 20140116 '
Url_type = ' http ' # HTTP|TASKQUEUE|CRON|MAIL|RDC
url_type2 = ' Access ' # when Type=http access|debug|error|warning|notice|resources
Secret_key = ' Zwzim4zhk35i50003kz2lh3hyilz01m03515j0i5 '

# Encode Request
params = Dict ()
params[' act ' = ' log '
params[' appname '] = AppName
Params[' from '] = From_date
Params[' to '] = To_date
params[' type '] = Url_type

if Url_type = = ' http ':
params[' type2 '] = url_type2

params = collections. Ordereddict (Sorted (Params.items ()))

Request = '
For k,v in Params.iteritems ():
Request + + ' = ' +v+ ' & '

Sign = Request.replace (' & ', ')
Sign + + Secret_key

MD5 = HASHLIB.MD5 ()
Md5.update (sign)
Sign = Md5.hexdigest ()

Request = Api_url + request + ' sign= ' + sign

# Request API
Response = Urllib2.urlopen (Request). Read ()
Response = json.loads (response)

If response[' errno ']!= 0:
print ' [!] ' +response[' errmsg ']
Exit ()

print ' [#] Request Success '

# Download and save files
Log_files = List ()

For Down_url in response[' data ']:
file_name = Re.compile (R ' \d{4}-\d{2}-\d{2} '). FindAll (Down_url) [0] + '. tar.gz '
Log_files.append (file_name)
data = Urllib2.urlopen (Down_url). Read ()
With open (file_name, "WB") as File:
File.write (data)

print ' [#] You got%d log files '% len (log_files)

# compress these files to Access_log
Access_log = open (' Access_log ', ' W ');

For Log_file in Log_files:
tar = Tarfile.open (log_file)
Log_name = Tar.getnames () [0]
Tar.extract (Log_name)
# Save to Access_log
data = open (Log_name). Read ()
Access_log.write (data)
Os.remove (Log_name)

print ' [#] all file has writen to Access_log '

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.