Access_log log file and definitions of ip, uv, and pv, access_loguv
- Accesslog is a log generated by apache, nginx, and other web services. It corresponds to each request of a webpage and contains a large amount of information, after analyzing the accesslog, you can have a general understanding of the website's operation. In case of a problem, you can also roughly locate the problem by analyzing the accesslog data. Engineers responsible for website O & M and architecture should be familiar with accesslog. Engineers related to policy performance can also analyze accesslog to obtain user behavior data.
- At the same time, the log of accesslog analysis can also be estimated.User visits, peak access periods, regional access data, and so on provide a good data reference for performance testing and server expansion.
Accesslog Composition
101.226.166.254 - - [21/Oct/2013:20:34:28 +0800] "GET /movie_cat.php?year=2013 HTTP/1.1" 200 5209 "http://www.baidu.com" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDR; .NET4.0C; .NET4.0E; .NET CLR 1.1.4322; Tablet PC 2.0); 360Spider"
The following describes the meaning of this line of record:
- 101.226.166.254: (user IP)
- [21/Oct/2013: 20: 34: 28 + 0800]: (access time)
- GET: http request methods, including GET and POST
- /Movie_cat.php? Year = 2013: The currently accessed webpage is a dynamic webpage, movie_cat.php is the requested background interface, and year = 2013 is the parameter of the specific interface
- 200: service status. 200 indicates normal. Common examples include 301 permanent redirection, 4XX indicates request error, and 5XX indicates internal server error.
- 5209: the number of transmitted bytes is 5209, in bytes.
- "Http://www.baidu.com": refer: the previous page of the current page
- "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2 ;. net clr 2.0.50727 ;. net clr 3.5.30729 ;. net clr 3.0.30729; Media Center PC 6.0; MDDR ;. NET4.0C ;. NET4.0E ;. net clr 1.1.4322; Tablet PC 2.0); 360 Spider ": agent field: used to record information such as the operating system, browser version, and browser kernel.
Data Statistics and Analysis
- API request frequency:Statistics are collected by interface, by day, and by hour. Understand the website running status, request frequency of each interface, and user behavior habits.
- Response time:Average response time of one day, average response time of sub-interfaces, and average response time of sub-interfaces by hour. A request with a long response time may indicate a defect in the service performance and needs to be optimized accordingly.
- Exception analysis:A request whose status code is not 200 and whose response time exceeds a certain domain value. A large number of 404 do not use SEO and should be avoided as much as possible.
- Specific Parameter Statistics:For dynamic pages, an interface usually has multiple parameters. one or more of these parameters are particularly important and can be further refined to generate statistical results of parameters corresponding to the interface.
- Ip source statistics:You can collect statistics on the ip address sources of Web Page access, and further locate the ip address to generate website access by region. At the same time, you can collect statistics on ip addresses, you can also identify possible attacks or hacker behavior.
- Analysis of spider crawling:Search engine spider usually sets the agent field. By analyzing the agent field, you can obtain the number of times a website is crawled by Baidu, Google, and other search engines in a day, and which webpages are frequently crawled by spider, this is also the foundation of SEO.
Shell script analysis of accesslog
- View tcp connections on port 80
netstat -tan | grep "ESTABLISHED" | grep ":80" | wc -l
- The IP address with the most connections in the current WEB Server:
netstat -ntu |awk '{print $5}' |sort | uniq -c| sort -n -r231 ::ffff:127.0.0.1:809523 ::ffff:192.168.50.201:54322 ::ffff:192.168.50.203:801 servers)1 ::ffff:192.168.50.56:433141 ::ffff:192.168.50.21:29961 ::ffff:192.168.50.21:29891 ::ffff:192.168.50.200:80601 ::ffff:192.168.50.12:13001 ::ffff:192.168.50.12:12991 ::ffff:192.168.50.12:12981 ::ffff:127.0.0.1:579331 Address1 192.168.50.41:653101 192.168.50.41:649491 192.168.50.41:49653
- View the top 10 IP addresses with the most visits in the log
cat access_log |cut -d ' ' -f 1 |sort |uniq -c | sort -nr | awk '{print $0 }' | head -n 10 |less14085 121.207.252.12213753 218.66.36.11911069 220.162.237.61188 59.63.158.1181025 ::1728 220.231.141.28655 114.80.126.139397 117.25.55.100374 222.76.112.211348 120.6.214.70
- View more than 100 IP addresses in the log
cat access_log |cut -d ' ' -f 1 |sort |uniq -c | awk '{if ($1 > 100) print $0}'|sort -nr |less14085 121.207.252.12213753 218.66.36.11911069 220.162.237.61188 59.63.158.1181025 ::1728 220.231.141.28655 114.80.126.139397 117.25.55.100374 222.76.112.211348 120.6.214.70252 58.211.82.150252 159.226.126.21206 121.204.57.94192 59.61.111.58186 218.85.73.40145 221.231.139.30134 121.14.148.220123 222.246.128.220122 61.147.123.46119 121.204.105.58107 116.9.75.237105 118.123.5.173.....
- View the traffic volume of a day
cat access_log|grep '12/Nov/2012'|grep "******.htm"|wc|awk '{print $1}'|uniq
- View the url list with access time exceeding 30 ms
cat access_log|awk ‘($NF > 30){print $7}’|sort -n|uniq -c|sort -nr|head -20
- List URLs whose response time exceeds 60 MB and count the number of occurrences
cat access_log |awk ‘($NF > 60 && $7~/\.php/){print $7}’|sort -n|uniq -c|sort -nr|head -100
- Statistics/index. [html] Page access uv
grep "/index.html" access.log | cut –d “ ” –f 4| sort | uniq | wc –l
grep "/index.html" access.log | wc -l
Definition of ip, uv, and pv
- IP (independent IP): Internet Protocol, which refers to the number of independent IP addresses. The same IP address is calculated only once between and.
- PV (access volume): Page View, that is, Page views or clicks. Each refresh is calculated once.
- UV (Independent Visitor): Unique Visitor. A computer client that accesses your website is a Visitor. The same client is calculated only once between and.
Differences between ip, pv, and uv
- IP (independent IP address): the number of times that a computer with a certain IP address accesses the website. This statistical method is easy to implement and has authenticity. Therefore, it is an important indicator to measure website traffic.
- PV (Traffic Volume): PV reflects the number of pages browsing a website. Therefore, each refresh is counted once. That is to say, PV is proportional to the number of visitors, but PV is not the number of visitors to the page, but the number of pages accessed by the website.
- UV (Independent Visitor): it can be understood as the number of computers accessing a website. Website cookies are used to determine the identity of a visitor's computer. If you change the IP address but do not clear cookies and then access the same website, the UV count of the website remains unchanged.
For example:
- A user from a computer using ADSL visited the "goto52" website, and each of them browsed two pages. The website traffic statistics are as follows:
- IP (independent IP): 1
- PV (Traffic): 6 (3 people multiplied by 2 pages)
- UV (Independent Visitor): 1
- If the IP addresses of all three are changed (ADSL re-dial) and two pages are browsed
- IP (independent IP): 3
- PV (Traffic): 6
- UV (Independent Visitor): 1
- Therefore, IP addresses (independent IP addresses) reflect the number of network address objects, and UV (Independent Visitor) reflects the actual number of users, each UV corresponds to an actual viewer more accurately than each ip address.