Use HTML tags to restrict search engine capture of Web sites

Source: Internet
Author: User

Sometimes there is such a need, in the Web page is not finished, or privacy can not be published, and then can not stop the search engine to crawl the page!

The first method: restricting page snapshots

Restrict all search engines to create a snapshot of a webpage: <meta name= "Robots" content= "noarchive" >

Restrict Baidu's search engine to set up a webpage snapshot: <meta name= "Baiduspider" content= "noarchive" >

The second method: Prohibit search engine crawl this page and search engine reference page

<meta name= "Robots" conrent= "Noindex,follow" >

Here, META name= "ROBOTS" refers to all the search engines, where we can also refer to a search engine.

For example: Meta name= "Googlebot", Meta name= "baiduspide" and so on.

The Content section has four commands: Index, NOINDEX, follow, nofollow, and the command is separated by the English ",".

Index command: Tell the search engine to crawl this page

Follow command: Tell the search engine to find the link from this page, and then continue to access the crawl down.

NOINDEX command: Tell the search engine not to allow crawling this page

nofollow command: Tells the search engine not to allow links to be found from this page and to deny their continued access.

According to the above command, there are four kinds of combinations as follows:

<meta name= "ROBOTS" content= "Index,follow": Can crawl this page, and can continue to index other links along this page;

<meta name= "ROBOTS" content= "Noindex,follow": Do not crawl this page, but you can follow this page to crawl the index of other links;

<meta name= "ROBOTS" content= "Index,nofollow": Can crawl this page, but do not follow this page to crawl index links;

<meta name= "ROBOTS" content= "Noindex,nofollow": Do not crawl this page, also do not follow this page to crawl index other links.

by robots.txt

The so-called robots.txt file is the first file that every search engine has to look for and access after your site, and robots.txt is the rule you have on the search engine to develop a way to index your site. Through this file, the search engine can know in your site, which files can be indexed, which files are rejected index.

Use HTML tags to restrict search engine capture of Web sites

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.