How to retrieve and analyze log files

Source: Internet
Author: User
Keywords Web Log
Tags analysis analytics code data different directory file files

Absrtact: As Seoer, we use a variety of tools to collect a wide range of technical issues, website analysis, crawl diagnostics, Baidu Webmaster Tools. All of these tools are useful, but are unmatched in the Web site log data analysis search engine spider

As Seoer, we use a variety of tools to collect a wide range of technical issues, web analytics, crawl diagnostics, Baidu webmaster tools, etc. All of these tools are useful, but are unmatched in the site log data analysis search engine spider crawl, just like Googlebot to crawl your site and leave a real record on your site. This is the Web server log. Logs are a powerful source of data that is often underutilized but helps keep your site's search engine crawling for completeness.

The server log is a specific server that records each action in detail. In the case of a Web server, you can get a lot of useful information. How to retrieve and analyze log files and identify problems based on your server's response code (404,302,500, and so on). I break it down into 2 parts, each highlighting different issues that can be found in your Web server log

First, get the log file

Search Engine Crawl website information will leave the information on the server, this information is in the website log file. We through the log can understand the search engine access, generally through the host service to open the log function, and then through the FTP access to the root directory of the site, in the root directory can see a log or Weblog folder, which is the log file, we put this log file download, with Notepad (or browser Opens to see the contents of the Web site log. So what is hidden in this diary? In fact, log files are like black boxes on airplanes. We can learn a lot of information through this log, so what exactly does this log deliver to us? Let's do a simple description below.

Date: This will give you a day to search engine crawl speed trend of development analysis.

Crawled files: This will tell you which directories and files are crawled, and that certain sections or types of content can help pinpoint the problem.

Status code: (only list common to and can directly anyway site Problem status code)

200 Status code: The request was successful and the response header or the data body that the request expects will be returned with this response.

302 Status Code: The requested resource now temporarily responds to requests from different URIs.

404 Status Code: The request failed, and the requested resource was not found on the server.

500 Status code: The server encountered an unexpected condition that prevented it from completing processing of the request.

-Provides a list of what pages are being run by reptiles and what problems they are responding to.

From where: Although this is not necessarily a useful analytical search robot, it is very valuable to other traffic analysis.

What kind of crawler: This will tell you which search engine crawler is running on your Web page.

Second, the resolution of Web site log files

Now you need a log analysis tool, because if your site has a few m or more than dozens of m or even hundred m log data, you can not go to see. Moreover, even if the log data is not much, a view is not scientific. Here with light years SEO log analysis tools for everyone to do an example.

1. Import files to your resolution software.

2. Analyze the website log to find out the problem in time

The quickest way for a search engine to crawl your site is to look at the server response code in service. 404 (Unable to find the page) may mean grabbing the precious resource is wasted; 302 redirect requested resources are now temporarily responding to requests from different URIs; 500 is the server encountered an unexpected situation, causing it to not complete the processing of the request, you can analyze the problem of the server. Although Web site Administration tools provide some information, such errors can cause a very large impact on your site.

The first step in the analysis is to generate a datasheet from your log data, through a Lightyear SEO log analysis tool. On the most basic level, let's look at which search engine crawlers crawl this site:

Through the report we would like to ask a few questions:

A. Yahoo spider total crawl volume accounted for all 47.12%; then I see from the flow statistics. No traffic is coming from the Yahoo search engine. Can the spider forbid him to visit again?

B. Baidu Spider the number of visits, the time to stay, the total amount of grab response to what?

C. What are the reasons for the number of spiders visited by other search engines, the time of stay and the total amount of crawl? Is there any way to improve it?

Next, let's look at the spider status Code analysis that we are most concerned about.

This is the only display of this log has a problem with the spider status code, only normal 200 will not be analyzed. We are going to take a closer look at this form. Overall, the good to bad ratio looks healthy, but there are some individual questions that let us try to figure out what's going on.

302 The number of problems is acceptable, but does not mean that can not be left to deal with, we should have a better way to deal with these problems, perhaps with a robots.txt directive should exclude these pages are crawled.

404 of the show reached 109. In tens of thousands of of the crawl volume. This data on the website is also possible. However, it also needs to be resolved to identify potential problems by isolating the 404 directory or using rel = "nofollow" to annotate these 404 links. Of course, 404 of the page must also have.

Conclusion

Baidu Web site Management provides you with information to crawl errors, but in many cases they limit data. As SEO, we should make use of all available data, after all, only a data source, you can really rely on your own source. Log does not lie!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.