Path Analysis for server logs

Source: Internet
Author: User

What is path analysis:

By analyzing the directory structure in the log file and counting the number of occurrences of each directory, the number of paths that eventually form the order is counted.

The Web server includes many, Iis,apache,nginx, and so on. First of all to figure out what is the server log, because the small partner today asked me, what is the log, is not from the database? I think a lot of people who do not know the technology is not very clear about this, he is engaged in data analysis, nature of technology do not understand.

Server logs: Any communication between the client (Web page, mobile phone, other mobile terminal, etc.) and the server will be recorded, including: Time, client type, access source, IP address, access status.

For example, the following Web service Nginx logs

118.186.156.230--[11/jul/2014:13:20:07 +0800] "Post/business/checkmemberrank http/1.1″200" http://oppor.99114.co m/oftencate/skipprosupplybasic?code=121105103&category=%25e5%25ae%25b6%25e7%2594%25a8%25e7%2594%25b5%25e5% 2599%25a8%2520%253e%2520%25e5%25ae%25b6%25e7%2594%25b5%2520%253e%2520%25e7%25a9%25ba%25e6%25b0%2594%25e5%2587% 2580%25e5%258c%2596%25e5%2599%25a8%25e3%2580%2581%25e6%25b0%25a7%25e6%25b0%2594%25e6%259c%25ba "" Mozilla/4.0 ( Compatible MSIE 8.0; Windows NT 5.1; trident/4.0) "-

It looks messy, actually there are rules, as follows:

Log_format access ' $remote _addr-$remote _user [$time _local] "$request" "$status $body _bytes_sent" $http _referer "" "$ Http_user_agent "$http _x_forwarded_for";

Among them, the meanings of each field are as follows:

1. $remote _addr and $http_x_forwarded_for to record the IP address of the client;

2. $remote _user: Used to record the client user name;

3. $time _local: Used to record access time and time zone;

4. $request: The URL used to record the request and the HTTP protocol;

5. $status: Used to record the status of the request; success is 200.

6. $body _bytes_s ent: Record the size of the principal content sent to the client file;

7. $http _referer: Used to record links from that page to access;

8. $http _user_agent: Record information about the client browser;

Since the log file itself is a regular, then the question comes, in the end how to analyze the log?

There are many ways to do this, but I use sell awk and Sed,awk to split the data vertically, while SED is split horizontally. Here's a piece of code I wrote, and it's the core segment.

For II in $ (ls/data/logs/$i/-u1 |sed-n 1p)   do   echo/data/logs/$i/$ii     name= ' echo $ii |awk-f '. ' {print $ '   date= ' echo $ii |awk-f '. ' ' {print $4} '   tar xvf/data/logs/$i/$ii-C/root/outdata/   mv/root/outdata/*/root/outdata/data.log   awk ' $ 9! = 444 && $9! = 404 {print $4 "'" ' "'" ' "$7" ' "$11} '/root/outdata/data.log |grep-v-e '. js|. Gif|. ico|. css|. Jpg|. PNG ' >out.log   sed-i ' s/20[0-9][0-9]:/2014/g ' out.log   sed-i ' s/\ '//g ' out.log   sed-i ' s/\[//g ' out.log
   month= ' Date +%b--date= '-1 day ' '   day= ' date +%d--date= '-1 day '   #request ur   awk-f ' ' {print $} ' out . log |sort |uniq-c |sort-rn|sed ' s/^[\t]*//' >/root/shell/request/$name-$date. txt

AWK is responsible for the division, but also can be statistical, so you can calculate the file path strength, the number of times generated each time, is always a variety of loops. Finally, a path statistic is formed.

In order to better view, I made a table and two forms of graphic, refer to the demo version of it www.webmapdata.com as for the data visualization, but also to do a good job, the experience is very important, the results of data analysis is also very important, for the visualization of this piece next time to chat, End

    • This article is from: Linux Learning Network

Path Analysis for server logs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.