SEO optimization Manual (4) control spider Track

Last Update:2014-12-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Sometimes we will encounter such difficulties: we did not want to be the search engine included in the Web site is the search engine "ruthless" included, so long as the Google input a "backstage, management site:www.***.com", their own background address will be revealed undoubtedly, So web site security is also impossible to talk about. When this happens, how can we prevent the search engine from collecting files that we don't want to be included?

Generally at this time, we usually have two methods, one is to edit the robots.txt file, the other is in the page does not want to be included in the head of meta Name= "ROBOTS" label.

The so-called robots.txt file, is every search engine to your site after looking for and access to the first file, robots.txt is your search engine to develop a how to index your site rules. With this file, the search engine can tell which files are indexed and which are rejected in your site.

In many sites, webmasters are ignoring the use of robots.txt files. Because a lot of stationmaster thinks, own website has no secret to say, and oneself also do not use robots.txt grammar very much, so once write wrong will bring more trouble, still might as well simply do not need.

In fact, this approach is wrong. In the previous article we know that if a site has a large number of files can not find the time (404), the search engine will reduce the weight of the site. and robots.txt as a spider to visit the first file of the website, once the search engine if cannot find this file, will also record the next 404 information on his index server.

Although in Baidu's help file, there is such a phrase "Please note that only when your site contains content that you do not want to be included in the search engine, you need to use the robots.txt file." If you want search engines to include everything on your site, do not create robots.txt files. "But I personally think that building robots.txt is still necessary, even if the robots.txt file is a blank text document." Because our website is not just will be included in Baidu, but also will be included in other search engines, so, upload a robots.txt file or no harm.

How to write a reasonable robots.txt file?

First we need to understand some basic syntax for the robots.txt file.

Grammatical effects

Writing

Allow all search engines to access all parts of the site

or create a blank text document named Robots.txt

User: *

Disallow:

User: *

Allow:/

Prohibit all search engines from accessing all parts of the site

User: *

Disallow:/

Prohibit Baidu index your website

User-agent:baiduspider

Disallow:/

Prohibit Google from indexing your site

User-agent:googlebot

Disallow:/

Prohibit all search engines except Google from indexing your site

User-agent:googlebot

Disallow:

User: *

Disallow:/

Prohibit all search engines except Baidu to index your site

User-agent:baiduspider

Disallow:

User: *

Disallow:/

Prohibit spiders from accessing a directory

(for example, prevent admin\css\images from being indexed)

User: *

Disallow:/css/

Disallow:/admin/

Disallow:/images/

Allow access to certain specific URLs in a directory

User: *

Allow:/css/my

Allow:/admin/html

Allow:/images/index

Disallow:/css/

Disallow:/admin/

Disallow:/images/

Use "*" to restrict access to a suffix's domain name

For example, index access to all ASP files under Admin directory

User: *

Disallow:/admin/*.htm

Use "$" to allow access only to files with a suffix in a directory

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More