Recently, in the company's Nginx log analysis, one of the requirements is to extract the daily access to the TOP10 page this month, and its number of visits.
To do this, the first thing to do is to clean up effective page access. I use the exclusion method to remove the. js. css and other access. But initially, I was not able to fully understand the request to remove the suffix.
Cleaning, sampling, cleaning, sampling, cleaning, and so on, will eventually need to filter out URIs containing the following suffixes
. js. css. gif. jpeg. jpg. png. ico. txt. swf. Xml. Jpeg. Png. Jpg
#python代码: If Re.search (r "(\.js|\.css|\.gif|\.jpe?g|\.png|\.ico|\.txt|\.swf|\). JPE? G|\. Png|\.xml) ", Request[1]): continue
There may be some special cases in the journals of different companies, which also require sampling analysis
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
The above describes the [log analysis] in the Nginx log, extract the valid request URI, including the aspects of the content, I hope the PHP tutorial interested in a friend helpful.