What is path analysis:
By analyzing the directory structure in the log file and counting the number of occurrences of each directory, the number of paths that eventually form the order is counted.
The Web server includes many, Iis,apache,nginx, and so on. First of all to figure out what is the server log, because the small partner today asked me, what is the log, is not from the database? I think a lot of people who do not know the technology is not very clear about this, he is engaged in data analysis, nature of technology do not understand.
Server logs: Any communication between the client (Web page, mobile phone, other mobile terminal, etc.) and the server will be recorded, including: Time, client type, access source, IP address, access status.
For example, the following Web service Nginx logs
118.186.156.230--[11/jul/2014:13:20:07 +0800] "Post/business/checkmemberrank http/1.1″200" http://oppor.99114.co m/oftencate/skipprosupplybasic?code=121105103&category=%25e5%25ae%25b6%25e7%2594%25a8%25e7%2594%25b5%25e5% 2599%25a8%2520%253e%2520%25e5%25ae%25b6%25e7%2594%25b5%2520%253e%2520%25e7%25a9%25ba%25e6%25b0%2594%25e5%2587% 2580%25e5%258c%2596%25e5%2599%25a8%25e3%2580%2581%25e6%25b0%25a7%25e6%25b0%2594%25e6%259c%25ba "" Mozilla/4.0 ( Compatible MSIE 8.0; Windows NT 5.1; trident/4.0) "-
It looks messy, actually there are rules, as follows:
Log_format access ' $remote _addr-$remote _user [$time _local] "$request" "$status $body _bytes_sent" $http _referer "" "$ Http_user_agent "$http _x_forwarded_for";
Among them, the meanings of each field are as follows:
1. $remote _addr and $http_x_forwarded_for to record the IP address of the client;
2. $remote _user: Used to record the client user name;
3. $time _local: Used to record access time and time zone;
4. $request: The URL used to record the request and the HTTP protocol;
5. $status: Used to record the status of the request; success is 200.
6. $body _bytes_s ent: Record the size of the principal content sent to the client file;
7. $http _referer: Used to record links from that page to access;
8. $http _user_agent: Record information about the client browser;
Since the log file itself is a regular, then the question comes, in the end how to analyze the log?
There are many ways to do this, but I use sell awk and Sed,awk to split the data vertically, while SED is split horizontally. Here's a piece of code I wrote, and it's the core segment.
For II in $ (ls/data/logs/$i/-u1 |sed-n 1p) do echo/data/logs/$i/$ii name= ' echo $ii |awk-f '. ' {print $ ' date= ' echo $ii |awk-f '. ' ' {print $4} ' tar xvf/data/logs/$i/$ii-C/root/outdata/ mv/root/outdata/*/root/outdata/data.log awk ' $ 9! = 444 && $9! = 404 {print $4 "'" ' "'" ' "$7" ' "$11} '/root/outdata/data.log |grep-v-e '. js|. Gif|. ico|. css|. Jpg|. PNG ' >out.log sed-i ' s/20[0-9][0-9]:/2014/g ' out.log sed-i ' s/\ '//g ' out.log sed-i ' s/\[//g ' out.log
month= ' Date +%b--date= '-1 day ' ' day= ' date +%d--date= '-1 day ' #request ur awk-f ' ' {print $} ' out . log |sort |uniq-c |sort-rn|sed ' s/^[\t]*//' >/root/shell/request/$name-$date. txt
AWK is responsible for the division, but also can be statistical, so you can calculate the file path strength, the number of times generated each time, is always a variety of loops. Finally, a path statistic is formed.
In order to better view, I made a table and two forms of graphic, refer to the demo version of it www.webmapdata.com as for the data visualization, but also to do a good job, the experience is very important, the results of data analysis is also very important, for the visualization of this piece next time to chat, End
- This article is from: Linux Learning Network
Path Analysis for server logs