Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
I recently saw my Net (55.la) on the home page added an online production of the function, the user opened robots.55.la, will not allow the search engine access to the part of the input to disallow: After, click "Generate Robots.txt File", it can be immediately generated. This is a practical and quick webmaster tools.
Perhaps a lot of stationmaster to robots.txt already very familiar with, but for some new person, it may be a little fuzzy, below I say some related content about the robots.
1. What is a robots? Its role?
Search engine through a program robot (also known as Spider), automatic access to Web pages on the Internet and get web information.
You can create a plain text file in your website robots.txt, in this file to declare that the site does not want to be robot access, so that some or all of the content of the site can not be included in the search engine, or the designated search engine only included in the specified content. In fact, through robots.txt can control se included content, tell spiders which files and directories can be included, which cannot be included.
2. Why set robots.txt?
Properly set up robots.txt, you can better maintain the Web server, improve the comprehensive performance of the site.
① related research shows that if the site uses a custom 404 error page, then Spider will see it as robots.txt--although it is not a pure text file-This will give Spider index site is a great trouble, affect the search engine on the site page of the collection.
②robots.txt can stop unwanted search engines from hogging the server's valuable bandwidth, such as image strippers, which doesn't make much sense for most graphics-free sites, but consumes a lot of bandwidth.
③robots.txt can stop search engines crawling and indexing private pages.
④ for content rich, page number of sites, configuration robots.txt can prevent flood-like spider access, if not control, and even affect the normal access to the site.
From the SEO point of view, the robots.txt file is set up, because:
This is often the case on ① sites: Different links point to similar web content. This does not conform to SEO on the "Web content of the principle of reciprocity." The use of robots.txt files can shield off the minor links.
② website revision or URL rewrite optimization is not in line with search engine friendly links need to be all shielded off. Using robots.txt files to delete old links is consistent with search engine friendliness.
③ Some pages without keywords, shielding off better.
④ in general, the search results page in the station to screen out better.
3, several situations need attention:
①robots.txt applies lowercase, and its files are placed in the root directory of the site.
For example, when a robot accesses a Web site (such as http://www.55.la), the search bots first check that the site has http://www. 55.la/robots.txt This file, if the robot finds the file, it will determine the scope of its access rights based on the contents of the file.
②disallow:
The value of the item is used to describe a URL that you do not want to be visited, which can be a complete path or partial, and any URL that starts with disallow will not be accessed by robot. For example, "Disallow:/help" does not allow search engine access for both/help.html and/help/index.html, while "disallow:/help/" allows robot to access/help.html, not access/help/ Index.html.
③ any disallow record is empty, which means that all parts of the site are allowed to be accessed, and in the "/robots.txt" file, there must be at least one disallow record. If "/robots.txt" is an empty file, it means that the site is open and all content can be searched by the search robot.
4, with a few of the most common cases, direct examples:
① allows all se to be included in this site: robots.txt is empty, do not write anything.
② prohibits certain directories in all SE-indexed sites:
User: *
Disallow:/directory Name 1/
Disallow:/directory Name 2/
Disallow:/directory Name 3/
③ forbid a se to include this site, for example, prohibit Baidu:
User-agent:baiduspider
Disallow:/
④ prohibit all SE included in this site:
User: *
Disallow:/
⑤ joins the Sitemap.xml path, for example:
Sitemap:http://www.seotest.cn/sitemap.xml