Optimizing Robots.txt: The length of the young to avoid their own short

Source: Internet
Author: User

Robots.txt file is a simple txt text, but focus on web site building and site optimization Seoer are aware of its importance, its existence can not want to search engine crawl page, can also be like a map for the Spider to guide navigation. When spiders crawl to a site, the first access is the existence of Robots.txt files, and then follow the instructions in the content of the index access, if the file does not exist then follow the links in the page in order to access. So we can use it to block directories that don't need search engines to index, or the site map in the Robots.txt to guide the spider crawling, so for the site security or save server bandwidth and guide the index are very to the force, can be said to have reached the long to avoid their own short effect, the following we do to specific analysis:

  First, the use of Robots.txt to save server bandwidth

In general, the webmaster rarely do such a setting, however, when the server access to a large amount of content is too much to do a set to save the bandwidth of the server, such as shielding: image such a folder, for search engine index is not what the practical significance of the waste of a lot of bandwidth. If for a picture site, the consumption is more amazing, so using Robots.txt can fully solve this point.

  Second, the protection of Web site Security directory

In general, when setting the Robots.txt to manage the directory and database, backup directory set in, into the spider crawling, otherwise easily caused by the leakage of data to affect the safety of the site. Of course, there are other directories that administrators do not want spiders to index, which can also be set up so that search engines can strictly follow the rules for indexing.

  Third, prohibit Search engine index page

A site has always some do not want the public to see the page, this time we can use Robots.txt to set up, to avoid spiders on its index, such as the days before the author of the slow speed of the results update an article, resulting in repeated 3 times, the results of the search engine index, how to do? Repeated content is necessarily to optimize the site is not good, this time can be set Robots.txt to the extra page to screen out.

  Four, Robots.txt link site map

As spiders visit the site first to see is Robots.txt this file, then we can set up the site map, more conducive to the Spider index the latest information, and less to walk a lot of long way. such as display professional website construction company Pilotage Technology Map page: Http://www.****.net.cn/sitemap.xml So added to the Robots.txt is conducive to the search engine index. Also do not have to trouble every day to the search engine to submit a map file, is not very easy?

  V. WORDING and Precautions

For Robots.txt writing must be standardized, writing on the negligence of people and not a few. First of all: user-agent:* must be written, * for all search engines. Disallow: (file directory) does not include parentheses, which means that the search engine index is prohibited, give an example to specify:

Example 1:
user-agent:*
disallow:/
Indicates that any search engine index access is prohibited,

Example 2:

user-agent:*
Disallow:/seebk
Indicates no search engine index access to SEEBK directory

Example 3:

User-agent:baiduspider
user-agent:*
disallow:/

Said only allow Baidu Spider stack index access: Baidu: Baiduspider, Google: Googlebot, search: sosospider,alexa:ia_archiver, Yahoo slurp
Example 4:

user-agent:*
disallow:.jpg$

Prevent hotlinking jpg Images if your bandwidth is sufficient you can not set it.

After language: Optimization Robots.txt Yang long to avoid their own short, do a good job Robots.txt more easily optimize the development of Web site, this article www.joyweb.net.cn original writing!



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.