For some reasons, need to SAE on the site of the log files, from the SAE can only be downloaded by the world, download down manual processing compared to the egg pain, especially when the number is very large. Fortunately, the SAE provides the API to batch get log file download address, just wrote a python script to automatically download and merge these files
Call API to get download address
The document location is here
Set your own app and download parameters
The variables to be set in the request are as follows
Copy the Code code as follows:
Api_url = ' http://dloadcenter.sae.sina.com.cn/interapi.php? '
AppName = ' xxxxx '
from_date = ' 20140101 '
to_date = ' 20140116 '
Url_type = ' http ' # HTTP|TASKQUEUE|CRON|MAIL|RDC
url_type2 = ' Access ' # only when Type=http access|debug|error|warning|notice|resources
Secret_key = ' xxxxx '
Generate Request Address
Request Address generation method can look at the official website requirements:
1. Sort the parameters
2. Generate the request string, remove the &
3. Additional Access_key
4. Request string MD5 to form sign
5. Add sign to the request string
The specific implementation code is as follows
Copy the Code code as follows:
params = Dict ()
params[' act '] = ' log '
params[' appname '] = AppName
Params[' from '] = From_date
Params[' to '] = To_date
params[' type '] = Url_type
if Url_type = = ' http ':
params[' type2 '] = url_type2
params = collections. Ordereddict (Sorted (Params.items ()))
Request = "
For k,v in Params.iteritems ():
Request + = k + ' = ' +v+ ' & '
Sign = Request.replace (' & ', ')
Sign + = Secret_key
MD5 = HASHLIB.MD5 ()
Md5.update (sign)
Sign = Md5.hexdigest ()
Request = Api_url + request + ' sign= ' + sign
If response[' errno ']! = 0:
print ' [!] ' +response[' errmsg ']
Exit ()
print ' [#] Request Success '
Download log files
The SAE packages the daily log files in tar.gz format, the download is saved, and the file name is dated. tar.gz Name
Copy the Code code as follows:
Log_files = List ()
For Down_url in response[' data ']:
file_name = Re.compile (R ' \d{4}-\d{2}-\d{2} '). FindAll (Down_url) [0] + '. tar.gz '
Log_files.append (file_name)
data = Urllib2.urlopen (Down_url). Read ()
With open (file_name, "WB") as File:
File.write (data)
print ' [#] You got%d log files '% len (log_files)
Merging files
Merge files using the Trafile library to extract each file, and then append the contents of the file to the Access_log.
Copy the Code code as follows:
# compress these files to Access_log
Access_log = open (' Access_log ', ' W ');
For Log_file in Log_files:
tar = Tarfile.open (log_file)
Log_name = Tar.getnames () [0]
Tar.extract (Log_name)
# Save to Access_log
data = open (Log_name). Read ()
Access_log.write (data)
Os.remove (Log_name)
print ' [#] all file have writen to Access_log '
Full code
Copy the Code code as follows:
#!/usr/bin/env python
#-*-Coding:utf-8-*-
# @Author: Su Yan
# @Date: 2014-01-17 12:05:19
# @Last Modified By:su Yan
# @Last Modified time:2014-01-17 14:15:41
Import OS
Import Collections
Import Hashlib
Import Urllib2
Import JSON
Import re
Import Tarfile
# settings
# documents http://sae.sina.com.cn/?m=devcenter&catId=281
Api_url = ' http://dloadcenter.sae.sina.com.cn/interapi.php? '
AppName = ' Yansublog '
from_date = ' 20140101 '
to_date = ' 20140116 '
Url_type = ' http ' # HTTP|TASKQUEUE|CRON|MAIL|RDC
url_type2 = ' Access ' # only when Type=http access|debug|error|warning|notice|resources
Secret_key = ' Zwzim4zhk35i50003kz2lh3hyilz01m03515j0i5 '
# Encode Request
params = Dict ()
params[' act '] = ' log '
params[' appname '] = AppName
Params[' from '] = From_date
Params[' to '] = To_date
params[' type '] = Url_type
if Url_type = = ' http ':
params[' type2 '] = url_type2
params = collections. Ordereddict (Sorted (Params.items ()))
Request = "
For k,v in Params.iteritems ():
Request + = k + ' = ' +v+ ' & '
Sign = Request.replace (' & ', ')
Sign + = Secret_key
MD5 = HASHLIB.MD5 ()
Md5.update (sign)
Sign = Md5.hexdigest ()
Request = Api_url + request + ' sign= ' + sign
# Request API
Response = Urllib2.urlopen (Request). Read ()
Response = json.loads (response)
If response[' errno ']! = 0:
print ' [!] ' +response[' errmsg ']
Exit ()
print ' [#] Request Success '
# Download and save files
Log_files = List ()
For Down_url in response[' data ']:
file_name = Re.compile (R ' \d{4}-\d{2}-\d{2} '). FindAll (Down_url) [0] + '. tar.gz '
Log_files.append (file_name)
data = Urllib2.urlopen (Down_url). Read ()
With open (file_name, "WB") as File:
File.write (data)
print ' [#] You got%d log files '% len (log_files)
# compress these files to Access_log
Access_log = open (' Access_log ', ' W ');
For Log_file in Log_files:
tar = Tarfile.open (log_file)
Log_name = Tar.getnames () [0]
Tar.extract (Log_name)
# Save to Access_log
data = open (Log_name). Read ()
Access_log.write (data)
Os.remove (Log_name)
print ' [#] all file have writen to Access_log '