Python Analysis Nginx access logs and save to MySQL database instance _python

Source: Internet
Author: User

Use Python to parse Nginx access logs, split them according to the Nginx log format, and store them in the MySQL database.
One, Nginx access log format is as follows:

Copy Code code as follows:

$remote _addr-$remote _user [$time _local] "$request" $status $body _bytes_sent "$http _referer" "$http _user_agent" "$http _x_forwarded_for "' #使用的是nginx默认日志格式

Second, the Nginx access log reads as follows:
Copy Code code as follows:

182.19.31.129--[2013-08-13t00:00:01-07:00] "get/css/anniversary.css http/1.1" 304 0 "http://www.chlinux.net/" " mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/28.0.1500.95 safari/537.36 ""-"

The following is Python code for the Python Analytics nginx log:
Copy Code code as follows:
#!/usr/bin/env python
#coding: UTF8
Import OS
Import Fileinput
Import re
Import Sys
Import MySQLdb
#日志的位置
Logfile=open ("Access_20130812.log")
#使用的nginx默认日志格式 $remote _addr-$remote _user [$time _local] "$request $status $body _bytes_sent" $http _referer "" $http User_agent "" $http _x_forwarded_for "'
#日志分析正则表达式
#203.208.60.230
IpP = r "? P<ip>[\d.] *"
#以 [Start, any character other than [] prevents matching the next [] Item (You can also use non-greedy matching *?). is not in the brackets. You can match any character outside of a newline * This repetition is the "greedy" expression engine that tries to repeat as many times as possible. #以] End
#[21/jan/2011:15:04:41 +0800]
TIMEP = r "" "? P<time>\[[^\[\]]*\] "" "
#以 "Start, #除双引号以外的任意字符 prevent matching the next" "Item (also use non greedy matching *?), #以" End
# "get/entpshop.do?method=view&shop_id=391796 http/1.1"
# "get/entpshop.do?method=view&shop_id=391796 http/1.1"
REQUESTP = r "" "? P<request>\ "[^\"]*\ "" "" "
Statusp = r "? P<status>\d+ "
BODYBYTESSENTP = r "? P<bodybytesent>\d+ "
#以 "Start, any character other than double quotes prevents matching the next" "Item (You can also use non-greedy matching *?), #以" End
# "http://test.myweb.com/myAction.do?method=view&mod_id=&id=1346"
Referp = r "" "? P<refer>\ "[^\"]*\ "" "" "
#以 "Start, any character other than double quotes prevents matching the next" "Item (You can also use non-greedy matching *?) to" End
# "mozilla/5.0" (compatible; googlebot/2.1; +http://www.google.com/bot.html) "'
USERAGENTP = r "" "? P<useragent>\ "[^\"]*\ "" "" "
#以 (at the beginning, any character other than double quotes prevents matching the next () item (or you can use a non-greedy matching *?) to "End
# (compatible; googlebot/2.1; +http://www.google.com/bot.html) "'
Usersystems = Re.compile (R ' ([^\ (\)]*\)]
#以 "Start, any character other than double quotes prevents matching the next" "Item (You can also use non-greedy matching *?) to" End
Userlius = Re.compile (R ' [^\)]*\ "]
#原理: Mainly through the space and-to distinguish between the different items, each project to write their own matching expression
Nginxlogpattern = re.compile (%s) \-\-\ (%s) \ (%s) \ (%s) \ (%s) \ (%s) \ (%s)% (IpP, TIMEP, REQUESTP, Statusp, bodybytess ENTP, Referp, USERAGENTP), re. VERBOSE)
#数据库连接信息
Conn=mysqldb.connect (host= ' 192.168.1.22 ', user= ' test ', passwd= ' pass ', port=3306,db= ' python ')
Cur=conn.cursor ()
sql = "INSERT into Python.test VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)"
While True:
line = Logfile.readline ()
If not line:break
Matchs = Nginxlogpattern.match (line)
If Matchs!= None:
Allgroup = Matchs.groups ()
ip = allgroup[0]
Time = allgroup[1]
Request = Allgroup[2]
Status = Allgroup[3]
Bodybytessent = Allgroup[4]
Refer = Allgroup[5]
useragent = Allgroup[6]
Time = Time.replace (' T ', ') [1:-7]
If Len (useragent) > 20:
UserInfo = Useragent.split (")
Userkel = userinfo[0]
Try
Usersystem = Usersystems.findall (useragent)
Usersystem = usersystem[0]
Print Usersystem
Userliu = Userlius.findall (useragent)
Value = [Ip,time,request,status,bodybytessent,refer,userkel,usersystem,userliu[1]]
Conn.commit ()
Print value
Except Indexerror:
UserInfo = useragent
Value = [Ip,time,request,status,bodybytessent,refer,userinfo, "", ""]
Else
Useraa = useragent
Value = [Ip,time,request,status,bodybytessent,refer,useraa, "", ""]
Try
result = Cur.execute (Sql,value)
#conn. Commit ()
Print result
Except Mysqldb.error,e:
Print "Mysql Error%d:%s"% (E.args[0], e.args[1])
Conn.commit ()
Conn.close ()

After the data is deposited in the database, the following figure is shown:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.