Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
A website, can have robots.txt, also can not. But if there is to be, that must be done by the standard, the following personal experience for the next robots.txt production methods.
robots.txt file commands include:
Disallow-Tell the spider not to crawl some files or directories. The following code will prevent spiders from crawling all Web site files:
User: *
Disallow:/
Allow-tell spiders to crawl certain files. Allow and disallow in conjunction with the use, can tell spiders a directory, most of them do not crawl, only part of the crawl. The following code will make the spider not crawl the other files in the AB directory, but only crawl the files under the CD:
User: *
Disallow:/ab/
Allow:/AB/CD
$ wildcard-the character that matches the end of the URL. The following code will allow spiders to access URLs with the. htm suffix:
User: *
Allow: htm$
* Wildcard character-tell the spider to match any of the characters. The following code will prevent spiders from crawling all htm files:
User: *
Disallow:/*.htm
Sitemaps location-Tell the spider where your sitemap is, in the format:
Sitemap:
The three META tags supported include:
NOINDEX-Tell the spider not to index a page.
NOFOLLOW-Tell the spider not to follow the links on the page.
Nosnippet-Tell the spider not to display descriptive text in the search results.
Noarchive-Tell the spider not to show the snapshot.
NOODP-Tell the spider not to use the title and description in the Open directory.
These records or labels are now supported by three of them. One of the wildcard characters seems to have previously been Yahoo Microsoft does not support. Baidu now also supports Disallow,allow and two wildcard characters. Meta tags I did not find the official explanation of whether Baidu supports.
Only Google-supported meta tags are:
Unavailable_after-Tell spider webs when the page expires. After this date, it should not appear in the search results.
Noimageindex-Tell the spider not to index the picture on the page.
Notranslate-Tell the spider not to translate the page content.
Yahoo also supports meta tags:
Crawl-delay-the frequency at which spiders are allowed to delay crawling.
Noydir-Similar to the NOODP label, but refers to the Yahoo directory, not the Open directory.
Robots-nocontent-tells the spider that the part of the HTML that is being labeled is not part of the content of the Web page, or, in other words, tells the spider which part is the main content of the page (the content you want to retrieve).
MSN also supports META tags:
Crawl-delay
Another reminder is that when you return a 404 error, it means that the spider is allowed to crawl all the content. But when you crawl robots.txt files, and so on, and so on error, may cause the search engine does not include the website, because the spider does not know robots.txt file existence or inside has what content, this and confirms the file does not exist is not the same.