Use Python to parse Nginx access logs, split them according to the Nginx log format, and store them in the MySQL database.
One, Nginx access log format is as follows:
Copy Code code as follows:
$remote _addr-$remote _user [$time _local] "$request" $status $body _bytes_sent "$http _referer" "$http _user_agent" "$http _x_forwarded_for "' #使用的是nginx默认日志格式
Second, the Nginx access log reads as follows:
Copy Code code as follows:
182.19.31.129--[2013-08-13t00:00:01-07:00] "get/css/anniversary.css http/1.1" 304 0 "http://www.chlinux.net/" " mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/28.0.1500.95 safari/537.36 ""-"
The following is Python code for the Python Analytics nginx log:
Copy Code code as follows:
#!/usr/bin/env python
#coding: UTF8
Import OS
Import Fileinput
Import re
Import Sys
Import MySQLdb
#日志的位置
Logfile=open ("Access_20130812.log")
#使用的nginx默认日志格式 $remote _addr-$remote _user [$time _local] "$request $status $body _bytes_sent" $http _referer "" $http User_agent "" $http _x_forwarded_for "'
#日志分析正则表达式
#203.208.60.230
IpP = r "? P<ip>[\d.] *"
#以 [Start, any character other than [] prevents matching the next [] Item (You can also use non-greedy matching *?). is not in the brackets. You can match any character outside of a newline * This repetition is the "greedy" expression engine that tries to repeat as many times as possible. #以] End
#[21/jan/2011:15:04:41 +0800]
TIMEP = r "" "? P<time>\[[^\[\]]*\] "" "
#以 "Start, #除双引号以外的任意字符 prevent matching the next" "Item (also use non greedy matching *?), #以" End
# "get/entpshop.do?method=view&shop_id=391796 http/1.1"
# "get/entpshop.do?method=view&shop_id=391796 http/1.1"
REQUESTP = r "" "? P<request>\ "[^\"]*\ "" "" "
Statusp = r "? P<status>\d+ "
BODYBYTESSENTP = r "? P<bodybytesent>\d+ "
#以 "Start, any character other than double quotes prevents matching the next" "Item (You can also use non-greedy matching *?), #以" End
# "http://test.myweb.com/myAction.do?method=view&mod_id=&id=1346"
Referp = r "" "? P<refer>\ "[^\"]*\ "" "" "
#以 "Start, any character other than double quotes prevents matching the next" "Item (You can also use non-greedy matching *?) to" End
# "mozilla/5.0" (compatible; googlebot/2.1; +http://www.google.com/bot.html) "'
USERAGENTP = r "" "? P<useragent>\ "[^\"]*\ "" "" "
#以 (at the beginning, any character other than double quotes prevents matching the next () item (or you can use a non-greedy matching *?) to "End
# (compatible; googlebot/2.1; +http://www.google.com/bot.html) "'
Usersystems = Re.compile (R ' ([^\ (\)]*\)]
#以 "Start, any character other than double quotes prevents matching the next" "Item (You can also use non-greedy matching *?) to" End
Userlius = Re.compile (R ' [^\)]*\ "]
#原理: Mainly through the space and-to distinguish between the different items, each project to write their own matching expression
Nginxlogpattern = re.compile (%s) \-\-\ (%s) \ (%s) \ (%s) \ (%s) \ (%s) \ (%s)% (IpP, TIMEP, REQUESTP, Statusp, bodybytess ENTP, Referp, USERAGENTP), re. VERBOSE)
#数据库连接信息
Conn=mysqldb.connect (host= ' 192.168.1.22 ', user= ' test ', passwd= ' pass ', port=3306,db= ' python ')
Cur=conn.cursor ()
sql = "INSERT into Python.test VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)"
While True:
line = Logfile.readline ()
If not line:break
Matchs = Nginxlogpattern.match (line)
If Matchs!= None:
Allgroup = Matchs.groups ()
ip = allgroup[0]
Time = allgroup[1]
Request = Allgroup[2]
Status = Allgroup[3]
Bodybytessent = Allgroup[4]
Refer = Allgroup[5]
useragent = Allgroup[6]
Time = Time.replace (' T ', ') [1:-7]
If Len (useragent) > 20:
UserInfo = Useragent.split (")
Userkel = userinfo[0]
Try
Usersystem = Usersystems.findall (useragent)
Usersystem = usersystem[0]
Print Usersystem
Userliu = Userlius.findall (useragent)
Value = [Ip,time,request,status,bodybytessent,refer,userkel,usersystem,userliu[1]]
Conn.commit ()
Print value
Except Indexerror:
UserInfo = useragent
Value = [Ip,time,request,status,bodybytessent,refer,userinfo, "", ""]
Else
Useraa = useragent
Value = [Ip,time,request,status,bodybytessent,refer,useraa, "", ""]
Try
result = Cur.execute (Sql,value)
#conn. Commit ()
Print result
Except Mysqldb.error,e:
Print "Mysql Error%d:%s"% (E.args[0], e.args[1])
Conn.commit ()
Conn.close ()
After the data is deposited in the database, the following figure is shown: