Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Today, radish fish in Baidu site the author of Radish Home network, found that Baidu included in the content of the site at least 50% of the content for comment content, and included repeated, it is likely due to improper robots.txt settings. Radish fish specifically to find some information, and you share the robots.txt of WordPress set up the relevant skills.
Robots.txt is the search engine spiders come to the site first to see a file, because some of the content of the site is not want to be crawled by search engines, such as the site template files, CSS files, JS files, management interface, etc., this time will be set up some robots.txt files to make spiders follow this agreement.
WordPress robots.txt files in the location of the site in the root directory, if you do not have settings, WP will generate a dynamic robots.txt file. Viewing a Web site's robots.txt file method is also very simple, just enter the following path in the Address bar: your domain/robots.txt need to note that the filename to all lowercase oh. For example, the rotobs.txt of the first home net of radish is dynamically generated and is not friendly to search engines:
Below, radish fish for everyone to explain robots.txt writing methods and precautions:
1, robots.txt must upload to your website root directory, under the subdirectory invalid;
2, Robots.txt,disallow and so must pay attention to the case, can not change;
3, User-agent,disallow and so on after the colon must be in English state, after the colon can be empty, or no space. Someone on the internet said there must be a colon after the space, in fact, no is also possible, please see Google Chinese webmaster blog settings is this: http://www.googlechinawebmaster.com/robots.txt;
4, user said search engine Open Directory: Asterisk "*" on behalf of the search engine to use the following rules, Google's spider is "Googlebot", Baidu is "Baiduspider";
5, Disallow: The directory that does not allow search engine to visit and index;
6. Allow: Specify directories that allow search engines to access and index.
Below radish fish to say the specific usage of robots.txt in WordPress blog:
1, to prevent the search engine indexed WordPress Blog comments and comments page, so as to avoid the same problem with Radish home network.
You only need to add the following statement to the robots.txt file:
Disallow:/comments-Limit Crawl Comments
Disallow:/*?replytocom=-limit crawl every comment page
The above two lines of code can prevent the search engine to include your blog comments and comments page content!
2, to prevent the search engine indexed WordPress blog Feed subscription page, to prevent the search engine index appears in duplicate pages. Blog has a subscription to the whole station feed, each article classification, each article also has a Feed,feed page and site page content is basically the same, if the feed page is not blocked access, it is conceivable that this will produce a large number of duplicate pages. Have Bo friends reflect the feed disallow, from the Baidu search engine to increase the flow is very obvious. Notice here Disallow:/feed and the disallow with the last backslash:/feed/is completely different. Disallow:/feed can be banned URLs include: abc.com/feed,abc.com/feed.html,abc.com/feed/abc These three forms, and Disallow:/feed/ Only the search engines can be prevented from accessing abc.com/feed/abc this form of catalogue, so we should use disallow:/feed. (Refer to Faraway blog, thank you)
Disallow:/feed-Limit Crawl Feed content
Disallow:/*/*/feed-limit Crawl article categories and individual article page feeds
3, to prevent the search engine indexed WordPress blog Management page and template files, this should not be provided to the search engine index. You only need to add the following statement to the robots.txt file:
Disallow:/wp-admin-Limit Crawl Admin login page
Disallow:/wp-content/plugins-Limit Crawl Plugin file
Disallow:/wp-content/themes-limit crawl template files
Disallow:/wp-includes-limit Crawl js file
4, provide the site map for WordPress, you can add the following statement at the end:
sitemap:http://your domain/sitemap.xml
robots.txt file to add sitemap links, this is important, but also search engines like. How to generate WordPress sitemap, you can search the relevant plug-ins, WordPress has many plug-ins can generate your site map.
5, if you really to write robots.txt, you can use the tools of Baidu Webmaster tool to fully automatic generation of your robots file, this tool allows you to visualize the writing of your files.
Well, after setting the robots.txt file, please pay attention to test its correctness, radish fish recommended use Google Webmaster tools in the "Test robots.txt" function, very practical. Well, this article by the Radish Home Network http://www.luoboju.com original published, reproduced please specify, thank you.