According to the website's visit log to see search engine Spider's arrival

Source: Internet
Author: User
Keywords Search engine SEO

Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall

Search engine can bring considerable antecedents to the site, so the search engine on a site is very important to collect, this does not have to say more. But we are generally not quite sure when the search engine spiders first came to our website, it is not clear that the first time after the spider followed the frequency and so on.

From the front end of the search engine, you can see through the snapshot (cache) search engine on the site of a certain page of the time, but this search engine for the whole station crawl situation can not be very good statistical understanding. Are you out of your mind? Of course not, through the Web site detailed access to the log, you can observe some clues. Take the Apache server access log for example:

65.55.106.108--[21/nov/2009:15:01:10 +0800] "get/robots.txt http/1.1"---log 1

65.55.106.108--[21/nov/2009:15:02:09 +0800] "get/http/1.1" 4888---Log 2

At present, almost all search engines in the market follow a game rule, that is, they will be based on the site root set of the robots.txt to decide which pages to crawl and not to crawl which pages. Then we can search the log file for "robots.txt" to approximate the arrival time of the search engine. Why roughly, because the same search engine may read the file more than once, which is, of course, based on the earliest log record of the time the search engine first came. From the above "Log 1" can be seen, a search engine on November 21 to the Library Bar network for the first time to crawl. The IP address 65.55.106.108 input to the IP138 or other query IP system can be seen, the IP addresses corresponding to the "United States Microsoft Company", then we can be seen as Microsoft's Bing search engine spider's first time to come. Search engine according to robots.txt set, know which to allow crawl, which do not want to be crawled, then it started on this site, "Log 2" shows that Bing first crawled the homepage of the site (slash/meaning home).

203.208.60.197--[17/nov/2009:13:28:04 +0800] "get/icof/102104/102104124/4b2b6b30242458d2012424d38cd77283.html http/1.1 "200 5813--log 3

203.208.60.194--[13/nov/2009:09:02:46 +0800] "get/login/http/1.1" 200 8191--Log 4

66.249.67.50--[13/nov/2009:22:44:12 +0800] "get/icof/102104/102104112/4b2b6b30242458d2012424c8733a67f6.html HTTP/ 1.1 "200 5731--Log 5

The IP 203.208.60.194 and 203.208.60.197 input to the IP138 query system can be seen, the two IP corresponding is "Google (China) company." From here we can see that Google (China) put the spider program on more than one server, there is an IP segment belong to Google China's search engine. Interestingly, "Log 5" corresponds to the IP is 66.249.67.50, query its attribution, is "Mountain View, California, USA, Google." From "Log 4" and "Log 5" see, on November 13, Google China and the United States spiders at the same time came to the site to crawl. Big company's search engine spider should be like this, work together.

202.160.178.146--[17/nov/2009:13:29:44 +0800] "get/catalogofyongle/402881872323df84012323e0f0be00ab.html HTTP/1.0 "200 45002--Log 6"

The "Log 6" inside the corresponding IP address 202.160.178.146 to query, the query system directly to tell you the results of the "Yahoo China Chinese spider", the Yahoo Chinese search engine spiders also to the site to crawl.

Overall, the Web site's access log records the search engine arrival crawl details. More to observe the log file, more familiar with the IP range of each search engine, you can roughly to each search engine on your site included an understanding. The most important thing is that the content for the king, search engines like the original content more, newer and faster site, more remote content of the release it, search engine spiders will often patronize your site, I'm afraid that time to plug your log file:-)

This article by the Library Network webmaster Tiandong starting in the Library Bar network, All rights reserved: http://html.libzone.cn/blog/2009/11/21/125879090551788.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.