WordPress blog robots.txt Settings Tips

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Today, radish fish in Baidu site the author of Radish Home network, found that Baidu included in the content of the site at least 50% of the content for comment content, and included repeated, it is likely due to improper robots.txt settings. Radish fish specifically to find some information, and you share the robots.txt of WordPress set up the relevant skills.

  

Robots.txt is the search engine spiders come to the site first to see a file, because some of the content of the site is not want to be crawled by search engines, such as the site template files, CSS files, JS files, management interface, etc., this time will be set up some robots.txt files to make spiders follow this agreement.

WordPress robots.txt files in the location of the site in the root directory, if you do not have settings, WP will generate a dynamic robots.txt file. Viewing a Web site's robots.txt file method is also very simple, just enter the following path in the Address bar: your domain/robots.txt need to note that the filename to all lowercase oh. For example, the rotobs.txt of the first home net of radish is dynamically generated and is not friendly to search engines:

  

Below, radish fish for everyone to explain robots.txt writing methods and precautions:

1, robots.txt must upload to your website root directory, under the subdirectory invalid;

2, Robots.txt,disallow and so must pay attention to the case, can not change;

3, User-agent,disallow and so on after the colon must be in English state, after the colon can be empty, or no space. Someone on the internet said there must be a colon after the space, in fact, no is also possible, please see Google Chinese webmaster blog settings is this: http://www.googlechinawebmaster.com/robots.txt;

4, user said search engine Open Directory: Asterisk "*" on behalf of the search engine to use the following rules, Google's spider is "Googlebot", Baidu is "Baiduspider";

5, Disallow: The directory that does not allow search engine to visit and index;

6. Allow: Specify directories that allow search engines to access and index.

Below radish fish to say the specific usage of robots.txt in WordPress blog:

1, to prevent the search engine indexed WordPress Blog comments and comments page, so as to avoid the same problem with Radish home network.

You only need to add the following statement to the robots.txt file:

Disallow:/comments-Limit Crawl Comments

Disallow:/*?replytocom=-limit crawl every comment page

The above two lines of code can prevent the search engine to include your blog comments and comments page content!

2, to prevent the search engine indexed WordPress blog Feed subscription page, to prevent the search engine index appears in duplicate pages. Blog has a subscription to the whole station feed, each article classification, each article also has a Feed,feed page and site page content is basically the same, if the feed page is not blocked access, it is conceivable that this will produce a large number of duplicate pages. Have Bo friends reflect the feed disallow, from the Baidu search engine to increase the flow is very obvious. Notice here Disallow:/feed and the disallow with the last backslash:/feed/is completely different. Disallow:/feed can be banned URLs include: abc.com/feed,abc.com/feed.html,abc.com/feed/abc These three forms, and Disallow:/feed/ Only the search engines can be prevented from accessing abc.com/feed/abc this form of catalogue, so we should use disallow:/feed. (Refer to Faraway blog, thank you)

Disallow:/feed-Limit Crawl Feed content

Disallow:/*/*/feed-limit Crawl article categories and individual article page feeds

3, to prevent the search engine indexed WordPress blog Management page and template files, this should not be provided to the search engine index. You only need to add the following statement to the robots.txt file:

Disallow:/wp-admin-Limit Crawl Admin login page

Disallow:/wp-content/plugins-Limit Crawl Plugin file

Disallow:/wp-content/themes-limit crawl template files

Disallow:/wp-includes-limit Crawl js file

4, provide the site map for WordPress, you can add the following statement at the end:

sitemap:http://your domain/sitemap.xml

robots.txt file to add sitemap links, this is important, but also search engines like. How to generate WordPress sitemap, you can search the relevant plug-ins, WordPress has many plug-ins can generate your site map.

5, if you really to write robots.txt, you can use the tools of Baidu Webmaster tool to fully automatic generation of your robots file, this tool allows you to visualize the writing of your files.

  

Well, after setting the robots.txt file, please pay attention to test its correctness, radish fish recommended use Google Webmaster tools in the "Test robots.txt" function, very practical. Well, this article by the Radish Home Network http://www.luoboju.com original published, reproduced please specify, thank you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.