Path Analysis for server logs

Last Update:2015-01-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is path analysis:

By analyzing the directory structure in the log file and counting the number of occurrences of each directory, the number of paths that eventually form the order is counted.

The Web server includes many, Iis,apache,nginx, and so on. First of all to figure out what is the server log, because the small partner today asked me, what is the log, is not from the database? I think a lot of people who do not know the technology is not very clear about this, he is engaged in data analysis, nature of technology do not understand.

Server logs: Any communication between the client (Web page, mobile phone, other mobile terminal, etc.) and the server will be recorded, including: Time, client type, access source, IP address, access status.

For example, the following Web service Nginx logs

118.186.156.230--[11/jul/2014:13:20:07 +0800] "Post/business/checkmemberrank http/1.1″200" http://oppor.99114.co m/oftencate/skipprosupplybasic?code=121105103&category=%25e5%25ae%25b6%25e7%2594%25a8%25e7%2594%25b5%25e5% 2599%25a8%2520%253e%2520%25e5%25ae%25b6%25e7%2594%25b5%2520%253e%2520%25e7%25a9%25ba%25e6%25b0%2594%25e5%2587% 2580%25e5%258c%2596%25e5%2599%25a8%25e3%2580%2581%25e6%25b0%25a7%25e6%25b0%2594%25e6%259c%25ba "" Mozilla/4.0 ( Compatible MSIE 8.0; Windows NT 5.1; trident/4.0) "-

It looks messy, actually there are rules, as follows:

Log_format access ' $remote _addr-$remote _user [$time _local] "$request" "$status $body _bytes_sent" $http _referer "" "$ Http_user_agent "$http _x_forwarded_for";

Among them, the meanings of each field are as follows:

1. $remote _addr and $http_x_forwarded_for to record the IP address of the client;

2. $remote _user: Used to record the client user name;

3. $time _local: Used to record access time and time zone;

4. $request: The URL used to record the request and the HTTP protocol;

5. $status: Used to record the status of the request; success is 200.

6. $body _bytes_s ent: Record the size of the principal content sent to the client file;

7. $http _referer: Used to record links from that page to access;

8. $http _user_agent: Record information about the client browser;

Since the log file itself is a regular, then the question comes, in the end how to analyze the log?

There are many ways to do this, but I use sell awk and Sed,awk to split the data vertically, while SED is split horizontally. Here's a piece of code I wrote, and it's the core segment.

For II in $ (ls/data/logs/$i/-u1 |sed-n 1p)   do   echo/data/logs/$i/$ii     name= ' echo $ii |awk-f '. ' {print $ '   date= ' echo $ii |awk-f '. ' ' {print $4} '   tar xvf/data/logs/$i/$ii-C/root/outdata/   mv/root/outdata/*/root/outdata/data.log   awk ' $ 9! = 444 && $9! = 404 {print $4 "'" ' "'" ' "$7" ' "$11} '/root/outdata/data.log |grep-v-e '. js|. Gif|. ico|. css|. Jpg|. PNG ' >out.log   sed-i ' s/20[0-9][0-9]:/2014/g ' out.log   sed-i ' s/\ '//g ' out.log   sed-i ' s/\[//g ' out.log
   month= ' Date +%b--date= '-1 day ' '   day= ' date +%d--date= '-1 day '   #request ur   awk-f ' ' {print $} ' out . log |sort |uniq-c |sort-rn|sed ' s/^[\t]*//' >/root/shell/request/$name-$date. txt

AWK is responsible for the division, but also can be statistical, so you can calculate the file path strength, the number of times generated each time, is always a variety of loops. Finally, a path statistic is formed.

In order to better view, I made a table and two forms of graphic, refer to the demo version of it www.webmapdata.com as for the data visualization, but also to do a good job, the experience is very important, the results of data analysis is also very important, for the visualization of this piece next time to chat, End

This article is from: Linux Learning Network

Path Analysis for server logs

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Path Analysis for server logs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Path Analysis for server logs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support