Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Last time and we shared a < new station in Baidu K station in the case of serious, there are a lot of children's shoes plus my qq to learn from me, in fact, I also just contact SEO is not experienced people, and career is not engaged in the network of this industry, just their hobbies. I am also often in Lou, Mouchangqing and other well-known promotional blogs and web sites continue to learn from, coupled with their own enough time and patience to test, from the practice of learning to gain experience!
All right, let's move on. Today's topic Robots.txt,robots.txt is the first file to view when visiting a Web site in a search engine. When a search spider accesses a site, it first checks to see if there is a robots.txt in the root directory of the site, and if so, the search robot will determine the scope of the access according to the contents of the file; If the file does not exist, all the search spiders will be able to access all pages that are not password protected on the site!
The introduction of the robots, already very clear, here to say why the site is very important. Many webmasters are not in their own site root directory to add this file and set it, its standard format you can search engines, you can use Google Webmaster tools to generate.
Use robots.txt to tell the spider their site weight distribution
To know, for a Web site, the weight is limited, especially the grassroots site, if the entire site to give equal authority, one unscientific, and then completely wasteful server resources (search spiders than normal access to occupy server resources, cpu/iis/bandwidth, etc.); You can think about it, Just like your site structure is not clear, no good weight statement, the spider can not determine what content on your site is what important content, what content is your main content.
Screen spider on the background files are included in the use of other standardized web page code, here does not explain, to my own grass egg nets, I think can be shielded on the cache, include, JS, update, skins, etc. directory, in order to not stupid B tell others directory of administrators, So the admin directory is not written here.
User: The following rules are applicable to the rover, generally fill "*"
Disallow: pages to intercept, usually written in front of allow
Allow: Do not intercept the page, general fill "/"
Sitemap: Site Map URL
If you want to block some spiders, someone asks if personalization is set up? You can write it on top of that.
User-agent:baiduspider
Disallow:/
With robots.txt limit garbage search engine included, reduce the site pressure, you can look at your traffic statistics, see the flow of traffic mainly from which search engines, do not come to the flow of the spider completely shielding; because I have a buddy is a virtual host provider, so know that garbage spiders on the stability of the site is very large He told me that I had met some of the site IP only dozens of a day, but the consumption of traffic on a fairly 1000-IP normal access. The following example is assumed to allow only Baidu, Google spider Access, all other prohibited
User-agent:baiduspider
Disallow:
User-agent:googlebot
Disallow:
User: *
Disallow:/
Sitemap:
With robots.txt tell Spider Web Station map is which file, Sitemap is to tell spiders your map file is which file, use absolute address, Google Spider suggested to Google Webmaster tools to submit, the advanced application of robots can find information on their own.
Resources:
Http://baike.baidu.com/view/1011742.htm
With some search spider robot Name:
Baidu each product uses different user:
Wireless Search Baiduspider-mobile
Image Search Baiduspider-image
Video Search Baiduspider-video
News search Baiduspider-news
Baidu Search Tibet Baiduspider-favo
Baidu Alliance Baiduspider-cpro
Web pages and other search Baiduspider
Search Spider's User:
Sosospider
Sosoimagespider
Google's
Googlebot
Googlebot-image
Googlebot-mobile
Mediapartners or Mediabot This special note: Google advertising crawler, used to match ads, if you do Google ads and limit all spiders, you tragedy, did not do Google ads, reptiles will not visit (this is Google's tough place, with a spider to serve advertising)
Others do not say, you can search their own, little brother writing sparse if there is a shortage of places, I hope that you can guide Pleaes prawns!
This article reproduced please indicate from the grass egg net www.fuckegg.cc, reproduced please keep this link, please respect the original Zhao design www.zhaofeng.org!