Absrtact: As a webmaster not only to know how to write original, how to send out the chain, but also to learn to analyze the log of the site. Webmaster will analyze the site log, you can understand your site in search engines is more important. What are the important things you can get through web logs?
As a webmaster not only to know how to write original, how to send out the chain, but also to learn to analyze the Web site log. Webmaster will analyze the site log, you can understand your site in search engines is more important. What important information can you get through the Web log, below to specify:
First, look at the site's crawl
1, the new station just on-line, see Search engine has not come to your site crawl;
2, the site included an exception, or by K, through the log can understand whether the search engine has to patronize your site;
3, to solve the problem of the website, must read the log;
Second, how to find the site log?
Generally in FTP a folder named logs, different server may log file name is not the same, but will certainly contain logs this keyword, the following figure:
Third, how to open the log?
Download, decompression, you can use a text editor to open, if the open is garbled, you can use EditPlus or Dreamweaver, such as Web page Editor open.
Four, the major search engine spider name:
After opening the log we can see the following search engine spider name, respectively, which is the search engine to your site.
Baidu: Baiduspider
Google=googlebot
Msn:msnbot
Yahoo:slurp
Yodao:youdaobot
Sogou:sogou+get+spider
360:360spider
Five: Log Dismantling
In the log search the name of the above spiders can clearly understand whether spiders have been to your site, and know that spiders crawl your site page situation.
123.125.71.33--[19/apr/2013:00:47:39 +0800] "get/page/contact/contact.php http/1.1 21978"-"" mozilla/5.0 ( compatible; baiduspider/2.0; +http://www.baidu.com/search/spider.html) "
The specific analysis is as follows:
123.125.71.33 Access IP
[19/apr/2013:00:47:39 +0800] access time and time zone
get/page/contact/contact.php http/1.1 According to http/1.1 Protocol crawl (under the domain name)/page/contact/contact.php This page (get represents the server action)
200 Server response status code.
21978 indicates that 21,978 bytes were crawled.
mozilla/5.0 means visitors use Firefox browser access information
Vi. Web site Log on SEO information
Different server or virtual host settings have different logging contents.
Some such as: 200 0 33834 237 953 by looking at a few more records, observe the law, you can determine that the third number represents the number of bytes.
Some such as: 200 0 0 or 200 0 64 this is no record of fetching bytes. Note: 200 0 0 and 200 0 64 do not represent any problems. The so-called 200 0 64 representatives to be K's comments are not based on the general web site has 64 code.
In the log, found that the more HTTP status code is 200 (normal), 304 (unchanged), 404 (Error link).
304 represents that the content has not been updated since the last crawl. Typically, a picture of a Web site often returns that value.
404 delegates, access to this link is the wrong link. This error link, on the one hand from the original existence later deleted the page, on the other hand may not exist, but other people outside the chain of such a dead link.