Design road signs for Web Robot on your homepage

Source: Internet
Author: User

Internet is cool, and WWW is well known. Publishing company information on the Internet and conducting e-commerce services have evolved from fashion to fashion. As a Web Master, you may be familiar with HTML, Javascript, Java, and ActiveX, but do you know what Web Robot is? Do you know the relationship between Web Robot and the home page of your plan?
Tramp on the Internet --- Web Robot

Sometimes you may find that the content of your homepage is indexed in a search engine, even if you have never been in touch with them. In fact, this is the credit of Web Robot. Web Robot is actually a program that can traverse the hypertext structure of a large number of Internet URLs and recursively retrieve all the content of the website. These programs are sometimes called Spider, Web tramp, web worm, or Web crawler. Some well-known Search Engines on the Internet have specialized Web Robot programs to collect information, such as Lycos, Webcrawler, and Altavista, search engines such as Polaris, NetEase, and GOYOYO.

Web Robot is like a non-fast customer. No matter whether you care about it or not, it will be loyal to its host's responsibilities, work tirelessly, and travel to the world wide Web space. Of course, it will also visit your home page, retrieve the content of the home page and generate the record format required by it. Maybe you are happy to know the content of the home page, but you are reluctant to gain insight and index some content. Does it mean that you can only "cross" your homepage space, can you direct and control Web Robot? Of course, the answer is yes. As long as you read the following content, you can set up a road map like a traffic police to tell Web Robot how to retrieve your home page and what can be searched and what cannot be accessed.

In fact, Web Robot can understand your words

Do not think that Web Robot runs without organization or control. Many Web Robot software provides two methods for website administrators or Web content producers to restrict the whereabouts of Web Robot:

1. Robots Exclusion Protocol

The administrator of a network site can create a file in special format on the site to indicate which part of the site can be accessed by the robot. This file is placed under the root directory of the site, namely http: //... /robots.txt.

2. Robots META tag

A webpage author can use a special html meta tag to identify whether a webpage can be cited, analyzed, or linked.

These methods are suitable for most Web robots. As to whether or not these methods are implemented in software, they also depend on Robot developers, and do not guarantee compliance with any Robot. If you urgently need to protect your content, consider using other protection methods, such as adding a password.

Use the Robots Exclusion Protocol

When a Robot accesses a Web site, such as http://www.sti.net.cn/, first check the file http:/// www.sti.net.cn/robots.txt. If the file exists, it will be analyzed according to the record format:

User-agent :*
Disallow:/cgi-bin/
Disallow:/tmp/
Disallow :/~ Joe/


To determine whether it should retrieve the site files. These records are specially viewed by Web Robot. Generally, the browser will never see this file. Therefore, do not add images like class HTML statement or "How do you do? Where are you from ?" Fake greetings.

Only One "/robots.txt" file can be found on a website, and each letter in the file name must be written in a small way. In the Robot record format, each individual "Disallow" line indicates the URL you do not want the Robot to access. Each URL must be a single line and cannot contain "Disallow: /cgi-bin/tmp. At the same time, empty rows cannot appear in a record, because empty rows are the marker of multiple records.

The User-agent line indicates the name of the Robot or other proxies. In the User-agent line, '*' indicates a special meaning-all robots.

Below are several examples of robot.txt:

Deny all robots on the server:
User-agent :*
Disallow :/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.