Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Previously written by the "Robots" label and robots.txt difference is simply the introduction of robots.txt, here in Wuhan SEO idle for everyone to introduce the detailed robots.txt and notes.
What is robots.txt for?
Robots.txt writing detailed and notice
There are many files in a Web site that contain background program files, foreground template files, pictures, and so on. There are some things we do not want to be Baidu spiders crawl, how to do? The search program to take this into account, spiders crawl site pages before they visit the site root directory of the robots.txt file, if the file exists then in accordance with the scope of Robots.txt limited to crawl, If it does not exist, the default is to crawl all.
Second, robots.txt in the role of SEO
Before an article "Niang ignores the existence of robots.txt file" We can see, Baidu a page crawl two URLs, this will spread the weight of the page, if we write good robots.txt can avoid such a situation. Robots.txt in SEO role is to screen unnecessary page crawl, for the effective page to win the spider grab opportunity. As a result of masking unnecessary page crawl so that can page weight, save network resources; Finally, we can put the site map inside to facilitate spiders crawl Web pages.
Iii. which documents can be shielded with robots.txt
Template files in a Web page, style sheet files and some files in the background even if the search engine crawl has no effect, but is a waste of web resources, such files can be blocked, if some of the pages of specific pages such as Contact us, the company's internal some do not need to open the photos can be shielded according to the actual situation.
Iv. robots.txt writing details and precautions
Take the host blog as an example, the robots.txt file is as follows:
user-agent:*//Allow all search engine spiders to crawl disallow:/wp-//Do not allow crawl URLs with WP URLs allow:/wp-content/uploads///Allow crawl wp-content/uploads/ Files in the directory
disallow:/*?*///Do not allow crawling URLs disallow:/feed Disallow:/trackback Disallow:/index.php? Disallow:/*.php$//Do not allow crawl URLs to end with. PHP URL Disallow:/*.css$ Disallow:/date/sitemap:http://www.chenhaoseo.com/ Sitemap.xml//Sitemap Sitemap:http://www.chenhaoseo.com/sitemap_baidu.xml
user-agent:* Such records can only be one. * On behalf of all engine spiders, if only for a search engine can write the fruit User-agent:baiduspider said only the following rules are for Baidu Spider.
Disallow: Describes URLs that do not need to be indexed or directories. For example, disallow:/wp-does not allow crawling URLs with WP URL, note that disallow:/date/and Disallow:/date is not the same, the former is only not allowed to crawl the date directory URL, if there are subfolders under the data directory, The subdirectory is allowed to crawl, which can mask all files under the date directory, including the screwdriver folder.
Allow: Describes URLs that do not need to be indexed or directories. Functionality is the opposite of disallow, with special attention to the order of the disallow and allow rows that make sense, robot determines whether to access a URL based on the first allow or disallow row that matches successfully.
Use "*" and "$": Baiduspider supports using wildcard "*" and "$" to blur matching URLs. "$" matches the line terminator. "*" matches 0 or more arbitrary characters.
Robots.txt writing to explain and notice to write here, if you feel a doubt after writing, you can use Google Administrator tools to test to ensure that robots.txt write correctly so as to play an effect. This article by Wuhan SEO idler original http://www.chenhaoseo.com seo technical Exchange QQ94775541