A case study of how Web logs are analyzed

Source: Internet
Author: User
Tags date current time include log query

Because before the Dragon Boat Festival. The author has done an experiment, and wrote an experimental report, "Explore the search engine does not include the content of the Web site and external chain factors have no relationship between the experiment," the specific experimental content, here will not say more. Because the author for leyuanbaby.com do this experiment conclusion, and did not achieve the desired effect, so I have not forget, through the site log to observe whether spiders crawl I did not include the link. In this process, get some experience about the analysis of web logs, here to share with you.


Through the site log we can clearly see the user and search engine spiders visit the site behavior, and form a data, which can let us know the search engine attitude to the site, as well as the health of the site. Through the site log we have a lot of indicators, such as: Access times, stay time, crawl amount, Catalog crawl statistics, page crawl statistics, Spider Access IP, HTTP status code, spider active Time, spider crawl path.

So let's take a look at the following examples of how Web logs are analyzed:

#Software: Microsoft Internet Information Services 6.0

#Version: 1.0 #Date: 2013-05-27 16:44:28 #Fields: Date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query ORT Cs-username C-IP CS (user-agent) sc-status sc-substatus-sc-win32-status 2013-05-27 16:44:27 W3SVC195483716 61.152. 94.150 get/index.html-80- mozilla/5.0+ (compatible; +baiduspider/2.0;++http://www.baidu.com/search/ spider.html) 0 2013-05-27 16:45:15 w3svc195483716 get/index.html-80- Mozilla/ 5.0+ (compatible +baiduspider/2.0;++http://www.baidu.com/search/spider.html) 200 0 64

Above is my site leyuanbaby.com May 27 of the site log, we are based on the above section of the log to analyze how to view the Web site log.

1. First we see the software, which refers to the name of the software, version of the software is the number of versions, these two are no longer said.

2.Date is the date of the visit, that is, Baidu Spider is what time to crawl your site.

3. S-sitename: Represent your virtual host or machine code;

4.S-IP: Server IP;

5.cs-method: Represents the access method or the occurrence of the request/commit event, there are two common: one is get, is usually we open a URL to access the action, the other is post, submit the form when the action;

6.cs-uri-stem: Which file or specific page the user accesses at the current time;

7.cs-uri-query: Refers to the access to the address of the accompanying parameters, such as ASP file? The string id=12, and so on, if there are no parameters, "-" is represented;

8.s-port: Port to access

9.cs-username: Visitor name, if no parameters are used "-" to indicate;

10.c-ip: Visitor IP

11.cs (user-agent): Access to search engines and spider names;

12.sc-status:http status code, 200 for success, 403 means no permissions, 404 means the page is not hit, 500 indicates that the program is wrong;

Through the above example, we already know the analysis of the site log to see the various data, then we analyze the Web log can bring us what role and hint it? The author thinks, the website log can bring us 6 of hints:

1. A clear understanding of Baidu Spiders Crawl Web page situation.

2. According to the spider's crawling situation, we can see the principle that the search engine collects the webpage.

3. Search Engine Spider crawl site is normal, thus understand the website optimization means is healthy.

4. Search engine spider in which page crawl most, why like this page, whether other pages to this page to learn.

5. Search engine spiders rarely visit the page to give us the hint is that this page may exist some search engines do not like things, we have to review the changes.

6. We uninstall the contents of the robots, search engine has no access?

At the same time we need to pay special attention to the HTTP status code, through the status code, we can more clearly know, this page has what kind of problem, how the spider is judged. The HTTP status code has many, is each stationmaster needs to grasp and to memorize, here does not have much explanation.

The site's health analysis, spider crawling rules are observed through the site log, you can say that the site log is a spider and webmaster communication a way to learn to see the site log, for Web site optimization has a very important role.

This article is original by Alan, reprint please indicate every day Lok source childcare net http://edu.leyuanbaby.com/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.