Search engine three big hit the joy, but occasionally also cooperation. Last year Google, Yahoo, Microsoft cooperated to comply with the unified http://www.aliyun.com/zixun/aggregation/9103.html ">sitemaps standard." The first two days the big three also announced the common adherence to the robots.txt document standards. Google, Yahoo, and Microsoft each posted a post on their official blog, announcing the criteria for the three robots.txt files and meta tags, as well as some of their specific criteria. Here's a summary.
The three files supported by the robots include:
Disallow-Tell the spider not to crawl some files or directories. The following code will prevent spiders from crawling all Web site files:
User: *
Disallow:/
Allow-tell spiders to crawl certain files. Allow and disallow in conjunction with the use, can tell spiders a directory, most of them do not crawl, only part of the crawl. The following code will make the spider not crawl the other files in the AB directory, but only crawl the files under the CD:
User: *
Disallow:/ab/
Allow:/AB/CD
$ wildcard-the character that matches the end of the URL. The following code will allow spiders to access URLs with the. htm suffix:
User: *
Allow: htm$
* Wildcard character-tell the spider to match any of the characters. The following code will prevent spiders from crawling all htm files:
User: *
Disallow:/*.htm
Sitemaps location-Tell the spider where your sitemap is, in the format:
Sitemap: <sitemap_location>
The three META tags supported include:
NOINDEX-Tell the spider not to index a page.
NOFOLLOW-Tell the spider not to follow the links on the page.
Nosnippet-Tell the spider not to display descriptive text in the search results.
Noarchive-Tell the spider not to show the snapshot.
NOODP-Tell the spider not to use the title and description in the Open directory.
These records or labels are now supported by three of them. One of the wildcard characters seems to have previously been Yahoo Microsoft does not support. Baidu now also supports Disallow,allow and two wildcard characters. Meta tags I did not find the official explanation of whether Baidu supports.
Only Google-supported meta tags are:
Unavailable_after-Tell spider webs when the page expires. After this date, it should not appear in the search results.
Noimageindex-Tell the spider not to index the picture on the page.
Notranslate-Tell the spider not to translate the page content.
Yahoo also supports meta tags:
Crawl-delay-the frequency at which spiders are allowed to delay crawling.
Noydir-Similar to the NOODP label, but refers to the Yahoo directory, not the Open directory.
Robots-nocontent-tells the spider that the part of the HTML that is being labeled is not part of the content of the Web page, or, in other words, tells the spider which part is the main content of the page (the content you want to retrieve).
MSN also supports META tags:
Crawl-delay
Also note that the robots.txt file can not exist, return 404 error, meaning that the spider is allowed to crawl all content. But when you crawl robots.txt files, and so on, and so on error, may cause the search engine does not include the website, because the spider does not know robots.txt file existence or inside has what content, this and confirms the file does not exist is not the same.
Author: Zac@seo Every day
Original download: New virtual Host