Python for nginx log analysis (analysis of an exception to flush an API interface IP)

Source: Internet
Author: User
Tags flush

Found that there is a large number of IP to our API collection, so write this script to get which IP only access to a single interface, but not access to other interfaces, generally such behavior, is abnormal.
The log format of the front-end load Nginx is analyzed as follows:

114.249.4.96--[15/jan/2016:23:59:47 +0800] "post/api2/realtimetrack/http/1.1" 200 48 "-" "-" "-"
222.128.172.215--[15/jan/2016:23:59:47 +0800] "post/api2/button_log/http/1.1" 200 48 "-" "-" "-"
110.72.182.177--[15/jan/2016:23:59:47 +0800] "post/api2/realtimetrack/http/1.1" 200 48 "-" "-" "-"
58.63.7.92--[15/jan/2016:23:59:48 +0800] "post/api2/getgoodsdetail/http/1.1" 200 877 "-" "-" "-"
117.177.160.218--[15/jan/2016:23:59:48 +0800] "post/api2/realtimetrack/http/1.1" 200 82 "-" "-" "-"
117.177.160.218--[15/jan/2016:23:59:48 +0800] "post/api2/realtimetrack/http/1.1" 200 82 "-" "-" "-"
163.142.55.76--[15/jan/2016:23:59:48 +0800] "post/api2/getuserinfo/http/1.1" 200 546 "-" "-" "-"
114.112.89.34--[15/jan/2016:23:59:48 +0800] "post/api2/getgoodslist/http/1.1" 200 9532 "-" "-" "-"
58.61.225.110--[15/jan/2016:23:59:49 +0800] "post/api2/realtimetrack/http/1.1" 200 82 "-" "-" "-"
114.244.195.163--[15/jan/2016:23:59:49 +0800] "post/api2/getgoodslist/http/1.1" 200 47834 "-" "-" "-"
114.244.195.163--[15/jan/2016:23:59:49 +0800] "post/api2/getgoodslist/http/1.1" 200 47834 "-" "-" "-"
114.112.89.34--[15/jan/2016:23:59:49 +0800] "post/api2/getgoodslist/http/1.1" 200 9532 "-" "-" "-"
125.39.170.239--[15/jan/2016:23:59:49 +0800] "post/api2/realtimetrack/http/1.1" 200 30 "-" "-" "-"
110.84.169.57--[15/jan/2016:23:59:50 +0800] "post/api2/realtimetrack/http/1.1" 200 48 "-" "-" "-"
42.81.46.142--[15/jan/2016:23:59:50 +0800] "post/api2/realtimetrack/http/1.1" 200 48 "-" "-" "-"
110.84.169.57--[15/jan/2016:23:59:50 +0800] "post/api2/realtimetrack/http/1.1" 200 82 "-" "-" "-"
117.136.40.148--[15/jan/2016:23:59:50 +0800] "post/api2/getgoodslist/http/1.1" 200 1024 "-" "-" "-"
117.12.243.251--[15/jan/2016:23:59:50 +0800] "post/api2/realtimetrack/http/1.1" 200 48 "-" "-" "-"
117.12.243.251--[15/jan/2016:23:59:50 +0800] "post/api2/realtimetrack/http/1.1" 200 82 "-" "-" "-"

Python Profiling code:
#!/usr/bin/env python
#coding: UTF8
__author__ = ' Dairu '
"""
Detection of nginx log access IP is there a program to crawl interface information

Rule: Program analysis accesses only the Getgoodslist interface without accessing other interface IP
"""

Log_path = '/home/logs/nginx/access.log '

#定义IP访问每个URL的次数的空字典, such as {' 10.0.0.1 ': {'/api2/getgoodslist ': 15}}}
Ip_info = {}
With open (Log_path, ' R ') as F:
For line in F.readlines ():
#获取IP地址
ip = line.split () [0]
#获取访问接口URL
url = line.split () [6]
#如果字典里没有该IP, add the IP to the key value, the URL is a level two dictionary key, and the number of accesses = 1
#如果有该IP但在二级字典中没有该URL, the URL is set to a level two dictionary key with a access number of 1
#如果有该IP, and the URL is in the Level two dictionary, the value of the URL is +1
If IP not in Ip_info:
Ip_info[ip] = {Url:1}
Else
If URL not in Ip_info[ip]:
Ip_info[ip][url] = 1
Else
Ip_info[ip][url] + + 1

#遍历结果, the IP only accesses less than 3 interfaces, and accesses the Getgoodslist interface more than 100 times to print out

For Ip,value in Ip_info.items ():
If Len (value) < 3 and Value.get ('/api2/getgoodslist/', 0) > 100:
Print "ip:%s url-count:%s"% (ip,value)

Analysis results:

ip:58.63.7.92 url-count:{'/api2/getgoodsdetail/': 3383, '/api2/getgoodslist/': 550}
ip:58.63.4.71 url-count:{'/api2/getgoodsdetail/': 4499, '/api2/getgoodslist/': 275}
ip:118.122.120.146 url-count:{'/api2/getgoodslist/': 443}
ip:114.244.195.163 url-count:{'/api2/getgoodslist/': 568}
ip:124.72.23.174 url-count:{'/api2/getgoodslist/': 132}
ip:183.30.79.59 url-count:{'/api2/getgoodslist/': 322, '/api2/realtimetrack/': 6}
ip:61.140.50.120 url-count:{'/api2/getgoodslist/': 1402}
ip:171.221.25.108 url-count:{'/api2/getgoodslist/': 1136}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.