SEO optimization Manual (4) control spider Track

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Sometimes we will encounter such difficulties: we did not want to be the search engine included in the Web site is the search engine "ruthless" included, so long as the Google input a "backstage, management site:www.***.com", their own background address will be revealed undoubtedly, So web site security is also impossible to talk about. When this happens, how can we prevent the search engine from collecting files that we don't want to be included?

Generally at this time, we usually have two methods, one is to edit the robots.txt file, the other is in the page does not want to be included in the head of meta Name= "ROBOTS" label.

The so-called robots.txt file, is every search engine to your site after looking for and access to the first file, robots.txt is your search engine to develop a how to index your site rules. With this file, the search engine can tell which files are indexed and which are rejected in your site.

In many sites, webmasters are ignoring the use of robots.txt files. Because a lot of stationmaster thinks, own website has no secret to say, and oneself also do not use robots.txt grammar very much, so once write wrong will bring more trouble, still might as well simply do not need.

In fact, this approach is wrong. In the previous article we know that if a site has a large number of files can not find the time (404), the search engine will reduce the weight of the site. and robots.txt as a spider to visit the first file of the website, once the search engine if cannot find this file, will also record the next 404 information on his index server.

Although in Baidu's help file, there is such a phrase "Please note that only when your site contains content that you do not want to be included in the search engine, you need to use the robots.txt file." If you want search engines to include everything on your site, do not create robots.txt files. "But I personally think that building robots.txt is still necessary, even if the robots.txt file is a blank text document." Because our website is not just will be included in Baidu, but also will be included in other search engines, so, upload a robots.txt file or no harm.

How to write a reasonable robots.txt file?

First we need to understand some basic syntax for the robots.txt file.

Grammatical effects

Writing

Allow all search engines to access all parts of the site

or create a blank text document named Robots.txt

User: *

Disallow:

Or

User: *

Allow:/

Prohibit all search engines from accessing all parts of the site

User: *

Disallow:/

Prohibit Baidu index your website

User-agent:baiduspider

Disallow:/

Prohibit Google from indexing your site

User-agent:googlebot

Disallow:/

Prohibit all search engines except Google from indexing your site

User-agent:googlebot

Disallow:

User: *

Disallow:/

Prohibit all search engines except Baidu to index your site

User-agent:baiduspider

Disallow:

User: *

Disallow:/

Prohibit spiders from accessing a directory

(for example, prevent admin\css\images from being indexed)

User: *

Disallow:/css/

Disallow:/admin/

Disallow:/images/

Allow access to certain specific URLs in a directory

User: *

Allow:/css/my

Allow:/admin/html

Allow:/images/index

Disallow:/css/

Disallow:/admin/

Disallow:/images/

Use "*" to restrict access to a suffix's domain name

For example, index access to all ASP files under Admin directory

User: *

Disallow:/admin/*.htm

Use "$" to allow access only to files with a suffix in a directory

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.