Download and merge SAE logs using the Python script.
For some reason, log files on the site on the SAE can only be downloaded by day from the SAE. The downloading process is quite painful, especially when the number is large. Fortunately, SAE provides APIs to obtain log files in batches. After writing a python script, it automatically downloads and merges these files.
Obtain by calling the API
The document is located here
Set your own app and download Parameters
The variables to be set in the request are as follows:
Copy codeThe Code is as follows:
Api_url = 'HTTP: // dloadcenter.sae.sina.com.cn/interapi.php? '
Appname = 'xxxxx'
From_date = '20140901'
To_date = '201312'
Url_type = 'http' # http | taskqueue | cron | mail | rdc
Url_type2 = 'access' # only when type = http access | debug | error | warning | notice | resources
Secret_key = 'xxxxx'
Generate request address
For more information about the request address generation method, see the requirements on the official website:
1. Sort Parameters
2. Generate a request string, remove &
3. Add access_key
4. Request the string for md5 to form the sign
5. Add the sign to the request string.
The specific implementation code is as follows:
Copy codeThe Code is as follows:
Params = dict ()
Params ['ac'] = 'log'
Params ['appname'] = appname
Params ['from'] = from_date
Params ['to'] = to_date
Params ['type'] = url_type
If url_type = 'HTTP ':
Params ['type2'] = url_type2
Params = collections. OrderedDict (sorted (params. items ()))
Request =''
For k, v in params. iteritems ():
Request + = k + '=' + v + '&'
Sign = request. replace ('&','')
Sign + = secret_key
Md5 = hashlib. md5 ()
Md5.update (sign)
Sign = md5.hexdigest ()
Request = api_url + request + 'sign = '+ sign
If response ['errno']! = 0:
Print '[!] '+ Response ['errmsg']
Exit ()
Print '[#] request success'
Download log files
Saepack every day's log files into a tar.gzformat. Save the files as soon as they are saved, and the file name is named as "date .tar.gz ".
Copy codeThe Code is as follows:
Log_files = list ()
For down_url in response ['data']:
File_name = re. compile (R' \ d {4}-\ d {2}-\ d {2} '). findall (down_url) [0] + '.tar.gz'
Log_files.append (file_name)
Data = urllib2.urlopen (down_url). read ()
With open (file_name, "wb") as file:
File. write (data)
Print '[#] you got % d log files' % len (log_files)
Merge files
Merge files by using the trafile library to decompress each file, and then append the file content to access_log.
Copy codeThe Code is as follows:
# Compress these files to access_log
Access_log = open ('Access _ log', 'w ');
For log_file in log_files:
Tar = tarfile. open (log_file)
Log_name = tar. getnames () [0]
Tar. extract (log_name)
# Save to access_log
Data = open (log_name). read ()
Access_log.write (data)
OS. remove (log_name)
Print '[#] all file has writen to access_log'
Complete code
Copy codeThe Code is as follows:
#! /Usr/bin/env python
#-*-Coding: UTF-8 -*-
# @ Author: Su Yan # @ Date: 12:05:19
# @ Last Modified by: Su Yan
# @ Last Modified time: 14:15:41
Import OS
Import collections
Import hashlib
Import urllib2
Import json
Import re
Import tarfile
# Settings
# Documents http://sae.sina.com.cn /? M = devcenter & catId = 281.
Api_url = 'HTTP: // dloadcenter.sae.sina.com.cn/interapi.php? '
Appname = 'yansublog'
From_date = '20140901'
To_date = '201312'
Url_type = 'http' # http | taskqueue | cron | mail | rdc
Url_type2 = 'access' # only when type = http access | debug | error | warning | notice | resources
Secret_key = 'zwzim4zhk35i50003kz2lh3hyilz01m03515j0i5'
# Encode request
Params = dict ()
Params ['ac'] = 'log'
Params ['appname'] = appname
Params ['from'] = from_date
Params ['to'] = to_date
Params ['type'] = url_type
If url_type = 'HTTP ':
Params ['type2'] = url_type2
Params = collections. OrderedDict (sorted (params. items ()))
Request =''
For k, v in params. iteritems ():
Request + = k + '=' + v + '&'
Sign = request. replace ('&','')
Sign + = secret_key
Md5 = hashlib. md5 ()
Md5.update (sign)
Sign = md5.hexdigest ()
Request = api_url + request + 'sign = '+ sign
# Request api
Response = urllib2.urlopen (request). read ()
Response = json. loads (response)
If response ['errno']! = 0:
Print '[!] '+ Response ['errmsg']
Exit ()
Print '[#] request success'
# Download and save files
Log_files = list ()
For down_url in response ['data']:
File_name = re. compile (R' \ d {4}-\ d {2}-\ d {2} '). findall (down_url) [0] + '.tar.gz'
Log_files.append (file_name)
Data = urllib2.urlopen (down_url). read ()
With open (file_name, "wb") as file:
File. write (data)
Print '[#] you got % d log files' % len (log_files)
# Compress these files to access_log
Access_log = open ('Access _ log', 'w ');
For log_file in log_files:
Tar = tarfile. open (log_file)
Log_name = tar. getnames () [0]
Tar. extract (log_name)
# Save to access_log
Data = open (log_name). read ()
Access_log.write (data)
OS. remove (log_name)
Print '[#] all file has writen to access_log'