is due to see the group within the great God @ Shenzhou browser statistics, I also want to practice, so today not out of the day (of course, still have to eat) ~
train of Thought
The first is to write an automatic timing task, run the script at 23:55 every night, script do log cutting and turn into the required data JSON
Make the access interface to get the parsed JSON data, such as browser model, System model
Draw a pie chart from the interface
timed task-Parse log
Write a shell that executes at 23:55 every day, roughly as follows:
#/bin/bash
# move data log to last
Cp-f/home/access.log/home/last.log
# cut into a log named after today's date
mv/home/access.log/home/access/$ (date +%y%m%d). log
# trigger node Analysis last log
Node '/home/parsejson.js '
# parse Nginx generate new log files
KILL-USR1 ' Cat/var/run/nginx.pid '
Parsejson.js's content is probably using Nginxparser packets to parse each log and generate Y-m-d.json to save
Parse Log Interface
Set up HTTP server and process the JSON data generated above when you visit, you can analyze it according to your own requirements, such as browser version, browser model, system version and so on, and of course you can do cache~
{"Errcode": 0, "browser": {"Chrome": {"Count": 5047, "version": {"5": 2, "10": 2, "11": 22, "12": 14, "16": 2, "20": 1, "21": 366, "24": 2, "28": 3, "29": 21, "30": 3, "31": 198, "32": 2, "33": 12, "34": 40, "35": 74, "36": 49, "37": 18, "38": 101, "39": 89, " ": 139," "The": 151, "the": 113, "the": 181, "the": 2741, "the": 665, "5": "," ":" "," "": {"Count": 6717, "version": {"3": "6" ": 3574," 7 ":" 8 ": 717," 9 ": 2169," ten ":" One ":" The "," "or": {}}, "Baidu": {"Count": 1372, "version": {"Spider": 1372}}, " Firefox ": {" Count ": 1368," version ": {" 0 ": 4," 1 ": 2," 2 ": 3," 3 ": 60," 4 ": 2," 6 ": 839," 7 ": 22," 10 ": 1," 13 ": 4," 14 ": 14," 21 ": 6," 22 ": 4," 24 ": 3," 26 ": 190," 28 ": 9," 29 ": 12," 30 ": 6," 31 ": 16," 34 ": 12," 36 ": 8," 37 ": 18," 38 ": 7," 39 ": 1," 40 " :}}, "Android Browser": {"Count": 203, "version": {"3": "4": 190}}, "Opera": {"Count": Panax, "version": {"12": 22, "28": 10 , "": 5}}, "Mobile Safari": {"Count": 669, "version": {"4": 2, "5": $, "6": $, "7": 321, "8": 158, "9": "," "Baidu": {"Count ":" Version ": {" 5 ": 2," 6 ": 9," ":" Boxapp ": 3}}," WebKit ": {" Count ": +," version ": {" 533 ": 2," 534 ": 30," 601 ": 4}}," Ucbrowser ": {" CoUnt ":", "version": {"9": Panax Notoginseng, "Ten": "I}", "Safari": {"Count": "Version": {"5": "7": 2, "8": 4, "9": "{"}, "Qqbrowser": {" Count ":" Version ": {" 5 ":}}," MIUI Browser ": {" Count ": 4," version ": {" 2 ": 4}}," Mozilla ": {" Count ": 2," version ": {" 5 " : 2}}, "Iemobile": {"Count": 5, "version": {"9": 4, "one": 1}}, "Edge": {"Count": 7, "version": {"": 7}}, "Chromium": {"Count" : 9, "version": {"": 6, "": 3}}, "Maxthon": {"Count": "Version": {"4":}}, "Silk": {"Count": 3, "version": {"1": 3}}, " Iceweasel ': {' count ': 6, "version": {"": 6}}, "Fennec": {"Count": 2, "version": {"9": 2}}, "Os": {"Mac os": {"Count": 2753, "Version": {"7": 2753}}, "Windows": {"Count": 10642, "version": "{": 4718, "8": 380, "ten": 184, "": 2, "Vista": 514, "XP" : 4814, "NT": "[S":}}, "Arch": {"Count": 2871, "version": {"slurp": 1489, "spider": 1340, "": "Bot":}, "Android": { "Count": 364, "version": {"2": 8, "3": "4": 321, "5": 3, "6": 5}}, "Linux": {"Count": 143, "version": {"x86_64": "i686" :}, "IOS": {"Count": 691, "version": {"4": One, "5": 6 ":", "7": 321, "8": 158, "9":}, "Gentoo": {"Count": 10, " Version ": {" FirEfox ":" BlackBerry ": {" Count ": 3," version ": {" 4 ": 3}}," Windows Phone ": {" Count ": 1," version ": {" 8 ": 1}}," Ubuntu ": {" Count ": 9," version ": {": 9}} "," Windows Phone OS ": {" Count ": 4," version ": {" 7 ": 4}}}," Http_status ": {" 200 ": 6030," 301 " : 7385, "302": 137, "304": 1451, "403": 814, "404": 1672, "405": 2}, "robot": {"Googlebot": 2553, "Yahoo": 1489, "Bingbot": 5702 , "Baiduspider": 1372, "blogtrottr": 631, "feedly": 222, "Haosouspider": 833, "Mj12bot": 2261, "Adsbot-google-mobile": 44, "Adsbot-google": 1446017644780, "Sogou": 3250, "Yisouspider": "A", "Http_bot": {"bot": 18609, "All": 38010}
The whole process of nginx log statistic scheme
The main reason for this article is to give Python the Nginx log statistics process, mainly because the recent system is often the crazy data of unknown programs, although the anti-crawl mechanism, but still have to find out which IP access times more. Think of the way is through the analysis of ngxin log, so as to find these IP ranking can be. Procedures for specific scenarios include:
Ngxin log daily cutting function;
Set the Ngxin log format;
Write the Python code to count the number of IP accesses in the Access.log before the daily cut and input the statistical results into MongoDB;
Write a Web query MongoDB for statistics.
The following steps are described in detail in each step.
First, nginx log daily cutting function
This feature is primarily done by writing the shell script itself, and then the shell script is set up by crontab the task cycle.
The shell script is as follows:
#!/bin/bash
# # 0 Execute the Script
# # Nginx The directory where the log files are located
Logs_path=/usr/local/nginx/logs
# # Get yesterday's YYYY-MM-DD
yesterday=$ (date-d "Yesterday" +%y-%m-%d)
# # Move Files
MV ${logs_path}/access.log ${logs_path}/access_${yesterday}.log
# # sends a USR1 signal to the Nginx main process. The USR1 signal is to reopen the log file
KILL-USR1 $ (cat/usr/local/nginx/nginx.pid)
Join Crontab
0 0 * * */bin/bash/usr/local/nginx/sbin/cut-log.sh
Two, set ngxin log format
Open the Nginx.conf configuration file and add it in the server section
Log_format access ' $remote _addr-$remote _user [$time _local] ' $request ' $status $body _bytes_sent ' $http _referer ' "$h Ttp_user_agent "$http _x_forwarded_for";
Access_log/usr/local/nginx/logs/access.log access;
Restart Ngxin after successful entry
./nginx-s Reload
third, the preparation of Python code in the daily cut before the access.log of the number of IP visits and statistical results into the MongoDB;
Download Pymongo, upload to server, and install
# tar ZXVF pymongo-1.11.tar.gz
# CD pymongo-1.11
# python setup.py Install
Python Connection MongoDB Sample
$ cat conn_mongodb.py
#!/usr/bin/python
Import Pymongo
Import Random
conn = Pymongo. Connection ("127.0.0.1", 27017)
db = Conn.tage #连接库
Db.authenticate ("Tage", "123")
#用户认证
Db.user.drop ()
#删除集合user
Db.user.save ({' id ': 1, ' name ': ' Kaka ', ' sex ': ' Male '})
#插入一个数据
For ID in range (2,10):
name = Random.choice ([' Steve ', ' Koby ', ' Owen ', ' Tody ', ' Rony '])
sex = Random.choice ([' Male ', ' female '])
Db.user.insert ({' id ': ID, ' name ': Name, ' Sex ': Sex})
#通过循环插入一组数据
Content = Db.user.find ()
#打印所有数据
For I in content:
Print I
Writing a Python script
#encoding =utf8
Import re
Zuidaima_nginx_log_path= "/usr/local/nginx/logs/www.zuidaima.com.access.log"
Pattern = Re.compile (R ' ^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} ')
def stat_ip_views (Log_path):
ret={}
f = open (Log_path, "R")
For line in F:
Match = Pattern.match (line)
If match:
Ip=match.group (0)
If IP in RET:
VIEWS=RET[IP]
Else
Views=0
Views=views+1
Ret[ip]=views
return ret
def run ():
Ip_views=stat_ip_views (Zuidaima_nginx_log_path)
max_ip_view={}
For IP in Ip_views:
VIEWS=IP_VIEWS[IP]
If Len (Max_ip_view) ==0:
Max_ip_view[ip]=views
Else
_ip=max_ip_view.keys () [0]
_VIEWS=MAX_IP_VIEW[_IP]
If Views>_views:
Max_ip_view[ip]=views
Max_ip_view.pop (_IP)
Print "IP:", IP, "views:", views
#总共有多少ip
Print "Total:", Len (ip_views)
#最大访问的ip
Print "Max_ip_view:", Max_ip_view
Run ()
The results of the above program operation:
ip:221.221.155.53, Views:1
ip:221.221.155.54, Views:2
Total:2
Max_ip_view: {' 221.221.155.54 ': 2}