Python analyzes Nginx access logs and saves them to MySQL database instances.

Source: Internet
Author: User
Tags expression engine

Use Python to analyze Nginx access logs, split the logs according to the Nginx log format, and store them to the MySQL database.
I. Nginx access log format:
Copy codeThe Code is as follows:
$ Remote_addr-$ remote_user [$ time_local] "$ request" $ status $ body_bytes_sent "$ http_referer" "$ http_user_agent" "$ http_x_forwarded_for" '# Use the nginx default log format
Ii. Nginx access log Content:
Copy codeThe Code is as follows:
182.19.31.129---[2013-08-13T00: 00: 01-07:00] "GET/css/anniversary.css HTTP/1.1" 304 0 "http://www.chlinux.net/" "Mozilla/5.0 (Windows NT 6.1; WOW64) appleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36 ""-"
3. The Python code for analyzing nginx logs is as follows:
Copy codeThe Code is as follows :#! /Usr/bin/env python
# Coding: utf8
Import OS
Import fileinput
Import re
Import sys
Import MySQLdb
# Log location
Logfile = open ("access_20130812.log ")
# The default nginx log format $ remote_addr-$ remote_user [$ time_local] "$ request" $ status $ body_bytes_sent "$ http_referer" "$ http_user_agent" "$ http_x_forwarded_for "'
# Regular Expression for log analysis
#203.208.60.230
IpP = r "? P <ip> [\ d.] *"
# Use any character except [] starting with [to prevent matching of the upper and lower [] items (you can also use non-Greedy matching *?) If it is not in square brackets, it can match any character out of the line break *. In this way, the "greedy" Expression Engine will try to repeat as many times as possible. # End]
# [21/Jan/2011: 15: 04: 41 + 0800]
TimeP = r """? P <time> \ [^ \ [\] * \] ""
# Start with ", # any character except double quotation marks to prevent matching of the upper and lower" items (non-Greedy matching can also be used *?), # End"
# "GET/EntpShop. do? Method = view & shop_id = 391796 HTTP/1.1"
# "GET/EntpShop. do? Method = view & shop_id = 391796 HTTP/1.1"
RequestP = r """? P <request> \ "[^ \"] * \"
StatusP = r "? P <status> \ d +"
BodyBytesSentP = r "? P <bodyByteSent> \ d +"
# Any character except double quotation marks starting with "to prevent matching of the upper and lower" items (non-Greedy matching can also be used *?), # End"
# "Http://test.myweb.com/myAction.do? Method = view & mod_id = & id = 1346"
ReferP = r """? P <refer> \ "[^ \"] * \"
# Any character except double quotation marks starting with "to prevent matching of the upper and lower" items (non-Greedy matching can also be used *?), End"
#"Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html )"'
UserAgentP = r """? P <userAgent> \ "[^ \"] * \"
# Use any character except double quotation marks to prevent matching of the upper and lower () items (non-Greedy matching can also be used *?), End"
# (Compatible; Googlebot/2.1; + http://www.google.com/bot.html )"'
UserSystems = re. compile (R' \ ([^ \ (\)] * \) ')
# Any character except double quotation marks starting with "to prevent matching of the upper and lower" items (non-Greedy matching can also be used *?), End"
Userlius = re. compile (R' [^ \)] * \ "')
# Principle: differentiate different projects by spaces and hyphens, and write matching expressions for each project.
NginxLogPattern = re. compile (r "(% s) \-\ (% s) \ (% s) "% (ipP, timeP, requestP, statusP, bodyBytesSentP, referP, userAgentP), re. VERBOSE)
# Database connection information
Conn = MySQLdb. connect (host = '192. 168.1.22 ', user = 'test', passwd = 'pass', port = 192, db = 'python ')
Cur = conn. cursor ()
SQL = "INSERT INTO python. test VALUES (% s, % s )"
While True:
Line = logfile. readline ()
If not line: break
Matchs = nginxLogPattern. match (line)
If matchs! = None:
AllGroup = matchs. groups ()
Ip = allGroup [0]
Time = allGroup [1]
Request = allGroup [2]
Status = allGroup [3]
BodyBytesSent = allGroup [4]
Refer = allGroup [5]
UserAgent = allGroup [6]
Time = time. replace ('T', '') [1:-7]
If len (userAgent)> 20:
Userinfo = userAgent. split ('')
Userkel = userinfo [0]
Try:
Usersystem = userSystems. findall (userAgent)
Usersystem = usersystem [0]
Print usersystem
Userliu = userlius. findall (userAgent)
Value = [ip, Time, request, status, bodyBytesSent, refer, userkel, usersystem, userliu [1]
Conn. commit ()
Print value
Failed t IndexError:
Userinfo = userAgent
Value = [ip, Time, request, status, bodyBytesSent, refer, userinfo, "", "]
Else:
Useraa = userAgent
Value = [ip, Time, request, status, bodyBytesSent, refer, useraa, "", ""]
Try:
Result = cur.exe cute (SQL, value)
# Conn. commit ()
Print result
Counter t MySQLdb. Error, e:
Print "Mysql Error % d: % s" % (e. args [0], e. args [1])
Conn. commit ()
Conn. close ()

4. data stored in the database is as follows:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.