Use Python to analyze Nginx access logs, split the logs according to the Nginx log format, and store them to the MySQL database.
I. Nginx access log format:
Copy codeThe Code is as follows:
$ Remote_addr-$ remote_user [$ time_local] "$ request" $ status $ body_bytes_sent "$ http_referer" "$ http_user_agent" "$ http_x_forwarded_for" '# Use the nginx default log format
Ii. Nginx access log Content:
Copy codeThe Code is as follows:
182.19.31.129---[2013-08-13T00: 00: 01-07:00] "GET/css/anniversary.css HTTP/1.1" 304 0 "http://www.chlinux.net/" "Mozilla/5.0 (Windows NT 6.1; WOW64) appleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36 ""-"
3. The Python code for analyzing nginx logs is as follows:
Copy codeThe Code is as follows :#! /Usr/bin/env python
# Coding: utf8
Import OS
Import fileinput
Import re
Import sys
Import MySQLdb
# Log location
Logfile = open ("access_20130812.log ")
# The default nginx log format $ remote_addr-$ remote_user [$ time_local] "$ request" $ status $ body_bytes_sent "$ http_referer" "$ http_user_agent" "$ http_x_forwarded_for "'
# Regular Expression for log analysis
#203.208.60.230
IpP = r "? P <ip> [\ d.] *"
# Use any character except [] starting with [to prevent matching of the upper and lower [] items (you can also use non-Greedy matching *?) If it is not in square brackets, it can match any character out of the line break *. In this way, the "greedy" Expression Engine will try to repeat as many times as possible. # End]
# [21/Jan/2011: 15: 04: 41 + 0800]
TimeP = r """? P <time> \ [^ \ [\] * \] ""
# Start with ", # any character except double quotation marks to prevent matching of the upper and lower" items (non-Greedy matching can also be used *?), # End"
# "GET/EntpShop. do? Method = view & shop_id = 391796 HTTP/1.1"
# "GET/EntpShop. do? Method = view & shop_id = 391796 HTTP/1.1"
RequestP = r """? P <request> \ "[^ \"] * \"
StatusP = r "? P <status> \ d +"
BodyBytesSentP = r "? P <bodyByteSent> \ d +"
# Any character except double quotation marks starting with "to prevent matching of the upper and lower" items (non-Greedy matching can also be used *?), # End"
# "Http://test.myweb.com/myAction.do? Method = view & mod_id = & id = 1346"
ReferP = r """? P <refer> \ "[^ \"] * \"
# Any character except double quotation marks starting with "to prevent matching of the upper and lower" items (non-Greedy matching can also be used *?), End"
#"Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html )"'
UserAgentP = r """? P <userAgent> \ "[^ \"] * \"
# Use any character except double quotation marks to prevent matching of the upper and lower () items (non-Greedy matching can also be used *?), End"
# (Compatible; Googlebot/2.1; + http://www.google.com/bot.html )"'
UserSystems = re. compile (R' \ ([^ \ (\)] * \) ')
# Any character except double quotation marks starting with "to prevent matching of the upper and lower" items (non-Greedy matching can also be used *?), End"
Userlius = re. compile (R' [^ \)] * \ "')
# Principle: differentiate different projects by spaces and hyphens, and write matching expressions for each project.
NginxLogPattern = re. compile (r "(% s) \-\ (% s) \ (% s) "% (ipP, timeP, requestP, statusP, bodyBytesSentP, referP, userAgentP), re. VERBOSE)
# Database connection information
Conn = MySQLdb. connect (host = '192. 168.1.22 ', user = 'test', passwd = 'pass', port = 192, db = 'python ')
Cur = conn. cursor ()
SQL = "INSERT INTO python. test VALUES (% s, % s )"
While True:
Line = logfile. readline ()
If not line: break
Matchs = nginxLogPattern. match (line)
If matchs! = None:
AllGroup = matchs. groups ()
Ip = allGroup [0]
Time = allGroup [1]
Request = allGroup [2]
Status = allGroup [3]
BodyBytesSent = allGroup [4]
Refer = allGroup [5]
UserAgent = allGroup [6]
Time = time. replace ('T', '') [1:-7]
If len (userAgent)> 20:
Userinfo = userAgent. split ('')
Userkel = userinfo [0]
Try:
Usersystem = userSystems. findall (userAgent)
Usersystem = usersystem [0]
Print usersystem
Userliu = userlius. findall (userAgent)
Value = [ip, Time, request, status, bodyBytesSent, refer, userkel, usersystem, userliu [1]
Conn. commit ()
Print value
Failed t IndexError:
Userinfo = userAgent
Value = [ip, Time, request, status, bodyBytesSent, refer, userinfo, "", "]
Else:
Useraa = userAgent
Value = [ip, Time, request, status, bodyBytesSent, refer, useraa, "", ""]
Try:
Result = cur.exe cute (SQL, value)
# Conn. commit ()
Print result
Counter t MySQLdb. Error, e:
Print "Mysql Error % d: % s" % (e. args [0], e. args [1])
Conn. commit ()
Conn. close ()
4. data stored in the database is as follows: