Sometimes our site may still be in the debugging phase, or only to partially inform the user or insider use, may not want to let the search engine crawl our site, the following introduction several blocked search engine crawl:
1. Create robots.txt at the root of the Web site to add content:
user-agent:* disallow:/ Prohibit a search engine, such as Baidu:
User-agent:baiduspider disallow:/
The major search engine code:
Google Spider: Googlebot
Baidu Spider:baiduspider
Yahoo spider:slurp
Alexa Spider:ia_archiver
MSN Spider:msnbot
Youdao Spider:yodaobot and Outfoxbot
Sogou Spider:sogou Spider
SOSO Spider:sosospider
Peoplerank:
2. Add Meta information between on a single page of the site:
<meta name= "Robots" content= "Index,follow" >
Content=index,follow: You can crawl this page, and you can continue to index other links along this page
Content=noindex,follow: Do not crawl this page, but you can follow this page to crawl index other links
Content=index,nofollow: Can crawl this page, but do not follow this page crawl index other links
Content=noindex,nofollow: Do not crawl this page, or follow this page to crawl index links.
You can also use the following statement to prevent a search engine from creating a snapshot of a page:
<meta name= "Robots" content= "index,follow,noarchive" >
3. Build a . htaccess file at the root of the website, which is more rigorous than robots (technician only), file content:
Rewriteengine on
Rewritecond%{http_user_agent} ^baiduspider [NC]
Rewriterule. *-[F]
How to block search engines on websites