The research of Nginx log statistics under Linux

Source: Internet
Author: User
Tags json mongodb python script

is due to see the group within the great God @ Shenzhou browser statistics, I also want to practice, so today not out of the day (of course, still have to eat) ~

train of Thought

The first is to write an automatic timing task, run the script at 23:55 every night, script do log cutting and turn into the required data JSON
Make the access interface to get the parsed JSON data, such as browser model, System model
Draw a pie chart from the interface

timed task-Parse log

Write a shell that executes at 23:55 every day, roughly as follows:

#/bin/bash

# move data log to last
Cp-f/home/access.log/home/last.log

# cut into a log named after today's date
mv/home/access.log/home/access/$ (date +%y%m%d). log

# trigger node Analysis last log
Node '/home/parsejson.js '

# parse Nginx generate new log files
KILL-USR1 ' Cat/var/run/nginx.pid '

Parsejson.js's content is probably using Nginxparser packets to parse each log and generate Y-m-d.json to save

Parse Log Interface

Set up HTTP server and process the JSON data generated above when you visit, you can analyze it according to your own requirements, such as browser version, browser model, system version and so on, and of course you can do cache~


{"Errcode": 0, "browser": {"Chrome": {"Count": 5047, "version": {"5": 2, "10": 2, "11": 22, "12": 14, "16": 2, "20": 1, "21": 366, "24": 2, "28": 3, "29": 21, "30": 3, "31": 198, "32": 2, "33": 12, "34": 40, "35": 74, "36": 49, "37": 18, "38": 101, "39": 89, " ": 139," "The": 151, "the": 113, "the": 181, "the": 2741, "the": 665, "5": "," ":" "," "": {"Count": 6717, "version": {"3": "6" ": 3574," 7 ":" 8 ": 717," 9 ": 2169," ten ":" One ":" The "," "or": {}}, "Baidu": {"Count": 1372, "version": {"Spider": 1372}}, " Firefox ": {" Count ": 1368," version ": {" 0 ": 4," 1 ": 2," 2 ": 3," 3 ": 60," 4 ": 2," 6 ": 839," 7 ": 22," 10 ": 1," 13 ": 4," 14 ": 14," 21 ": 6," 22 ": 4," 24 ": 3," 26 ": 190," 28 ": 9," 29 ": 12," 30 ": 6," 31 ": 16," 34 ": 12," 36 ": 8," 37 ": 18," 38 ": 7," 39 ": 1," 40 " :}}, "Android Browser": {"Count": 203, "version": {"3": "4": 190}}, "Opera": {"Count": Panax, "version": {"12": 22, "28": 10 , "": 5}}, "Mobile Safari": {"Count": 669, "version": {"4": 2, "5": $, "6": $, "7": 321, "8": 158, "9": "," "Baidu": {"Count ":" Version ": {" 5 ": 2," 6 ": 9," ":" Boxapp ": 3}}," WebKit ": {" Count ": +," version ": {" 533 ": 2," 534 ": 30," 601 ": 4}}," Ucbrowser ": {" CoUnt ":", "version": {"9": Panax Notoginseng, "Ten": "I}", "Safari": {"Count": "Version": {"5": "7": 2, "8": 4, "9": "{"}, "Qqbrowser": {" Count ":" Version ": {" 5 ":}}," MIUI Browser ": {" Count ": 4," version ": {" 2 ": 4}}," Mozilla ": {" Count ": 2," version ": {" 5 " : 2}}, "Iemobile": {"Count": 5, "version": {"9": 4, "one": 1}}, "Edge": {"Count": 7, "version": {"": 7}}, "Chromium": {"Count" : 9, "version": {"": 6, "": 3}}, "Maxthon": {"Count": "Version": {"4":}}, "Silk": {"Count": 3, "version": {"1": 3}}, " Iceweasel ': {' count ': 6, "version": {"": 6}}, "Fennec": {"Count": 2, "version": {"9": 2}}, "Os": {"Mac os": {"Count": 2753, "Version": {"7": 2753}}, "Windows": {"Count": 10642, "version": "{": 4718, "8": 380, "ten": 184, "": 2, "Vista": 514, "XP" : 4814, "NT": "[S":}}, "Arch": {"Count": 2871, "version": {"slurp": 1489, "spider": 1340, "": "Bot":}, "Android": { "Count": 364, "version": {"2": 8, "3": "4": 321, "5": 3, "6": 5}}, "Linux": {"Count": 143, "version": {"x86_64": "i686" :}, "IOS": {"Count": 691, "version": {"4": One, "5": 6 ":", "7": 321, "8": 158, "9":}, "Gentoo": {"Count": 10, " Version ": {" FirEfox ":" BlackBerry ": {" Count ": 3," version ": {" 4 ": 3}}," Windows Phone ": {" Count ": 1," version ": {" 8 ": 1}}," Ubuntu ": {" Count ": 9," version ": {": 9}} "," Windows Phone OS ": {" Count ": 4," version ": {" 7 ": 4}}}," Http_status ": {" 200 ": 6030," 301 " : 7385, "302": 137, "304": 1451, "403": 814, "404": 1672, "405": 2}, "robot": {"Googlebot": 2553, "Yahoo": 1489, "Bingbot": 5702 , "Baiduspider": 1372, "blogtrottr": 631, "feedly": 222, "Haosouspider": 833, "Mj12bot": 2261, "Adsbot-google-mobile": 44, "Adsbot-google": 1446017644780, "Sogou": 3250, "Yisouspider": "A", "Http_bot": {"bot": 18609, "All": 38010}



The whole process of nginx log statistic scheme

The main reason for this article is to give Python the Nginx log statistics process, mainly because the recent system is often the crazy data of unknown programs, although the anti-crawl mechanism, but still have to find out which IP access times more. Think of the way is through the analysis of ngxin log, so as to find these IP ranking can be. Procedures for specific scenarios include:

Ngxin log daily cutting function;
Set the Ngxin log format;
Write the Python code to count the number of IP accesses in the Access.log before the daily cut and input the statistical results into MongoDB;
Write a Web query MongoDB for statistics.

The following steps are described in detail in each step.

First, nginx log daily cutting function

This feature is primarily done by writing the shell script itself, and then the shell script is set up by crontab the task cycle.

The shell script is as follows:


#!/bin/bash
# # 0 Execute the Script
# # Nginx The directory where the log files are located
Logs_path=/usr/local/nginx/logs
# # Get yesterday's YYYY-MM-DD
yesterday=$ (date-d "Yesterday" +%y-%m-%d)
# # Move Files
MV ${logs_path}/access.log ${logs_path}/access_${yesterday}.log
# # sends a USR1 signal to the Nginx main process. The USR1 signal is to reopen the log file
KILL-USR1 $ (cat/usr/local/nginx/nginx.pid)

Join Crontab


0 0 * * */bin/bash/usr/local/nginx/sbin/cut-log.sh



Two, set ngxin log format

Open the Nginx.conf configuration file and add it in the server section


Log_format access ' $remote _addr-$remote _user [$time _local] ' $request ' $status $body _bytes_sent ' $http _referer ' "$h Ttp_user_agent "$http _x_forwarded_for";
Access_log/usr/local/nginx/logs/access.log access;



Restart Ngxin after successful entry


./nginx-s Reload



third, the preparation of Python code in the daily cut before the access.log of the number of IP visits and statistical results into the MongoDB;

Download Pymongo, upload to server, and install


# tar ZXVF pymongo-1.11.tar.gz
# CD pymongo-1.11
# python setup.py Install



Python Connection MongoDB Sample


$ cat conn_mongodb.py
#!/usr/bin/python

Import Pymongo
Import Random

conn = Pymongo. Connection ("127.0.0.1", 27017)
db = Conn.tage #连接库
Db.authenticate ("Tage", "123")
#用户认证
Db.user.drop ()
#删除集合user
Db.user.save ({' id ': 1, ' name ': ' Kaka ', ' sex ': ' Male '})
#插入一个数据
For ID in range (2,10):
name = Random.choice ([' Steve ', ' Koby ', ' Owen ', ' Tody ', ' Rony '])
sex = Random.choice ([' Male ', ' female '])
Db.user.insert ({' id ': ID, ' name ': Name, ' Sex ': Sex})
#通过循环插入一组数据
Content = Db.user.find ()
#打印所有数据
For I in content:
Print I



Writing a Python script


#encoding =utf8

Import re

Zuidaima_nginx_log_path= "/usr/local/nginx/logs/www.zuidaima.com.access.log"
Pattern = Re.compile (R ' ^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} ')

def stat_ip_views (Log_path):
ret={}
f = open (Log_path, "R")
For line in F:
Match = Pattern.match (line)
If match:
Ip=match.group (0)
If IP in RET:
VIEWS=RET[IP]
Else
Views=0
Views=views+1
Ret[ip]=views
return ret
def run ():
Ip_views=stat_ip_views (Zuidaima_nginx_log_path)
max_ip_view={}
For IP in Ip_views:
VIEWS=IP_VIEWS[IP]
If Len (Max_ip_view) ==0:
Max_ip_view[ip]=views
Else
_ip=max_ip_view.keys () [0]
_VIEWS=MAX_IP_VIEW[_IP]
If Views>_views:
Max_ip_view[ip]=views
Max_ip_view.pop (_IP)

Print "IP:", IP, "views:", views
#总共有多少ip
Print "Total:", Len (ip_views)
#最大访问的ip
Print "Max_ip_view:", Max_ip_view

Run ()



The results of the above program operation:


ip:221.221.155.53, Views:1
ip:221.221.155.54, Views:2
Total:2
Max_ip_view: {' 221.221.155.54 ': 2}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.