Prevent search engines from crawling pages

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

The Robots.txt file restricts the search engine bots (called bots) that crawl the web. These bots are automatic and see if there are any robots.txt files that restrict their access to a particular page before they can access the Web page. Robots.txt is a simple and effective tool if you want to protect certain content on your site from search engine revenue. Here is a brief introduction to how to use it.

How to place Robots.txt files

Robots.txt itself is a text file. It must be in the root directory of the domain name and named "Robots.txt". The robots.txt file located in the subdirectory is not valid because the rover finds the file only in the root directory of the domain name. For example, Http://www.1520cc.cn/robots.txt is a valid location, Http://www.1520cc.cn/mysite/robots.txt is not.

Here's a robots.txt example:

User: *

Disallow: CGI

Disallow:/tmp/

Disallow:/~name/

Block or delete an entire Web site using the robots.txt file

To remove your site from the search engine and prevent all bots from crawling your site at a later time, place the following robots.txt files in your server's root directory:

User: *

Disallow:/

To remove your site from Google only, and only to prevent Googlebot from crawling your site in the future, place the following robots.txt files in your server's root directory:

User-agent:googlebot

Disallow:/

Each port should have its own robots.txt file. Especially when you are hosting content through HTTP and HTTPS, these protocols need to have their own robots.txt files. For example, to make Googlebot index only for all HTTP Web pages and not for HTTPS pages, use the following robots.txt file.

For HTTP protocol (Http://yourserver.com/robots.txt):

User: *

Allow:/

For HTTPS protocol (Https://yourserver.com/robots.txt):

User: *

Disallow:/

Allow all bots to access your Web page

User: *

Disallow:

(Another method: Create an empty "/robots.txt" file, or do not use robot.txt.) )

Block or delete a Web page using the robots.txt file

You can use the robots.txt file to prevent Googlebot from crawling pages on your site. For example, if you are manually creating a robots.txt file to prevent Googlebot from crawling all pages in a particular directory (for example, private), you can use the following robots.txt entries:

User-agent:googlebot

Disallow:/private

To prevent Googlebot from grabbing all the files for a particular file type (for example,. gif), use the following robots.txt entries:

User-agent:googlebot

Disallow:/*.gif$

To prevent Googlebot from crawling all inclusions? URL (specifically, this URL starts with your domain name, takes over the meaning string, then a question mark, and then any string), and you can use the following entry:

User-agent:googlebot

Disallow:/*?

Although we do not crawl or index the content of Web pages that are blocked by robots.txt, we still crawl the Web site and index it if we find it on other pages on the web. Therefore, Web sites and other public information, such as the location text in links to the site, may appear in Google search results. However, the content on your Web page is not crawled, indexed, and displayed.

As part of webmaster Tools, Google provides robots.txt analytics tools. It can read the file in the same way as Googlebot read the robots.txt file and can provide results for Google user-agents, such as Googlebot. We strongly recommend that you use it. Before creating a robots.txt file, it is important to consider which content can be searched by the user and which should not be searched. In this way, through reasonable use of robots.txt, search engines to bring users to your site, but also to ensure that the privacy information is not included.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.