Python parsing Web Access logs
Common Log Format
127.0.0.1--[14/may/2017:12:45:29 +0800] "get/index.html http/1.1" 200 4286
Remote-host IP Request time TimeZone Method Resource Protocol status code send bytes
- Combined log Format
127.0.0.1--[14/may/2017:12:51:13 +0800] "get/index.html http/1.1", 4286 "HTTP://127.0.0.1/" "mozilla/5.0 (Windows N T 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/53.0.2785.116 safari/537.36 "
Remote host IP--Request Time Time zone method resource protocol status code send byte referer character browser information
Web Access Log Example
Analysis
? Statistics by day
o Number of log lines per day
o Browse the number of visits per IP per day
o Number of visitors per day = number of IP component collections appearing daily
o Number of status code occurrences per day
o Total Daily traffic
? Total statistics
o Total Log lines = number of log lines per day
O Total number of visitors = number of sets of all IP components appearing
? Geographical distribution
o sort the number of accesses for all IP occurrences take TOP20
o Find a location based on IP
Code
Python parsing Web Access logs