Professional statistics website, such as Baidu Statistics, Google ANALYTICS,CNZZ and other statistical background to provide the webmaster commonly used statistical indicators, such as UV,PV, online time, IP, etc., in addition, because of network reasons, I found that Google Analytics will be more than Baidu statistics more than hundreds of of the IP, so want to write their own feet to understand the actual number of visits, but the access log based on Nginx more than the statistical background, because a lot of spider's visit will be counted in, there are static file statistics, In fact, if the algorithm improvement can completely filter out the useless statistics, today to the cattle and cattle to share the most basic statistics, but also to learn and review the Python language.
For example, the server has nginx log as follows:
221.221.155.54--[02/aug/2014:15:16:11 +0800] "get/http/1.1" 8482 "http://www.zuidaima.com/" mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/31.0.1650.57 safari/537.36 ""-"" 0.020 "
221.221.155.53--[02/aug/2014:15:16:11 +0800] "get/http/1.1" 8482 "http://www.zuidaima.com/" mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/31.0.1650.57 safari/537.36 ""-"" 0.020 "
221.221.155.54--[02/aug/2014:15:16:11 +0800] "get/http/1.1" 8482 "http://www.zuidaima.com/" mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/31.0.1650.57 safari/537.36 ""-"" 0.020 "
The statistics script is as follows:
stat_ip.py
#encoding =utf8
Import re
Zuidaima_nginx_log_path= "/usr/local/nginx/logs/www.zuidaima.com.access.log"
Pattern = Re.compile (R ' ^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} ')
def stat_ip_views (Log_path):
ret={}
f = open (Log_path, "R")
For line in F:
Match = Pattern.match (line)
If match:
Ip=match.group (0)
If IP in RET:
VIEWS=RET[IP]
Else
Views=0
Views=views+1
Ret[ip]=views
return ret
def run ():
Ip_views=stat_ip_views (Zuidaima_nginx_log_path)
max_ip_view={}
For IP in Ip_views:
VIEWS=IP_VIEWS[IP]
If Len (Max_ip_view) ==0:
Max_ip_view[ip]=views
Else
_ip=max_ip_view.keys () [0]
_VIEWS=MAX_IP_VIEW[_IP]
If Views>_views:
Max_ip_view[ip]=views
Max_ip_view.pop (_IP)
Print "IP:", IP, ", Views:", views
#总共有多少ip
Print "Total:", Len (ip_views)
#最大访问的ip
Print "Max_ip_view:", Max_ip_view
Run ()
The results of the operation are as follows:
ip:221.221.155.53, Views:1
ip:221.221.155.54, Views:2
Total:2
Max_ip_view: {' 221.221.155.54 ': 2}
This gives access to all IP traffic and its maximum IP.
The above describes the python based on the Nginx access log statistics client IP access, including aspects of the content, I hope the PHP tutorial interested in a friend helpful.