Three of the three powerful functions implemented with ASP

Source: Internet
Author: User
Tags command line contains iis include root directory

How do I hide a page to avoid being searched

The search engines that we navigate on the Web use small programs---such as the ' robots ', ' bots ', ' crawlers ' and ' spiders '---We know to index the page. However, when developing a site, especially when using ASP for development, it is useful to prevent pages from being indexed. When these search engines are reviewing their scenarios for dynamically indexing the Web pages they create, such as the ASP page, this article will help you learn some important tips to avoid robots tracking pages that you don't want to be indexed by.

Why does this have something to do with you?

For example, now you've visited XYZ company's web site and searched for ' XYZ Corp. ' with your familiar search engine. If you find that your admin page is also in the relevant linked table, you will be a little worried. If you have an ecommerce site, you will not be willing to get the URL of the last page of the order portion of this site before the user ends the previous page. Not only does the network manager not want to happen. It is also annoying for users that some pages are not functioning properly, either because they do not have the appropriate permissions, or because they do not access the page in the order set. This is not conducive to the reputation of your site. This is also related to the operators of the commercial search engines themselves. Because they want to provide accurate links to improve the service.

So how do you prevent these programs from indexing certain pages of your site? There are two ways to choose between including a file named robots.txt in the root directory, or using the < META > tag.

Contains a robots.txt file

As early as 1994, an automated mailing list presented a joint agreement to prevent a robotic drag-and-drop site. But this is not a formal standard, it does not provide the guarantee of execution, but it is used by many robot authors.

Creating a robots.txt file is very simple, and it shows the robot behavior standards that administrators want. Note that the file name must be in lowercase letters and must be placed in the root folder of the site, such as Http://xyzcorp/robots.txt , so that a file can bring out all the details of the entire site.

What's in a robots.txt file?

Each robots.txt file contains one or more records. A record consists of a robot user agent string that you are willing to follow and instructions to apply to it. Don't worry about all the robot user agent strings you need to know to roam the web, because you can use wildcard * To apply all the robots. The following is an example of a record:

User-agent: *

Disallow:/xyzfinances.asp

Disallow:/admin

Disallow:/news/update.asp

In addition to the user agent string/wildcard character, you only need to include a disallow command. This simple example shows all the possibilities you will need. It indicates that no user agent string can enter xyzfinances.asp, which is represented by this line:

Disallow:/xyzfinances.asp

Alternatively, the user agent string cannot enter the administrative folder and all of the folders below it:

Disallow:/admin

Or a update.asp file in a news folder, if all the other content in the News folder can be indexed.

The name of the user agent string you would like to include in a record. Also in any robots.txt file, you can include as many records as you like (as long as you separate the records with one or more blank lines).

Each individual record can provide different instructions for one or more robots. However, it is advisable to add a wildcard rule to an engine that is not named with the user agent string. The most popular option is to maintain a scheme that represents a single record and a wildcard character that represents a user agent string. A list of 196 user agents is available for reference

http://info.webcrawler.com/mak/projects/robots/robots.html.

It is generally believed that robots should ignore case and version numbers. Keep in mind that this is the opinion of the robots in most business search engines, because you don't want to use useless pages to upset users. However, although you do not have to consider the case at the command line, you must be sure to type the URL correctly. Although Windows NT does not care about the capitalization of file names and pathname, not all platforms are.

The only other things you want to include are annotations, which use the Unix Bourne shell protocol, such as the # notation for whitespace before the hash symbol, and the rest of the line that can be ignored. If a line contains only one annotation, it can be completely ignored, although it does not function as a blank line between records.

Now look at the two final examples.

Example 1

# Don ' t come to this site

User-agent: *

Disallow:/# disallows anything

Example 2

# robots.txt for XYZCorp

# Webmaster:john Doe Contact JohnD@xyzcorp.com

User-agent: * Applies to all robots except next record

Disallow:/store/order/# No robot should visit any URLs starting with

/store/order/

Disallow:/admin/# Disallow any pages in the Admin folder

Disallow:/world_domination.asp # Disallow World_domination.asp

Well, that's all about the Robot.txt file.

The following describes how to use the < META > tag.

Use a < META > robot tag

Again, you cannot guarantee that a robot will fully comply with the instructions in the < META > tag, but it is still very effective for commercial search engines. < META > tags must be included in the < head > section of a file. The principle of their work is to tell robot whether a page with this tag can be indexed, and whether it can follow any links on the page or in the folders below it.

Again, the syntax is very simple. The first example is:

< META name= "robots" content= "Noindex" >

This line of code tells robot not to index this page.

Next Example:

< META name= "robots" content= "nofollow" >

Allows robot to index this page, but stipulates that it cannot follow any links on this page. If you want to disable both, you can use:

< META name= "robots" content= "Noindex, nofollow" >

This means that you do not index this page, and do not follow any links on this page. However, there is a simpler way:

< META name= "robots" content= "None" >

Or do not index the page, or follow any links on this page.

Unfortunately, if you have a file named admin.asp linked to update.asp, use < META > in admin.asp

tags to prevent robot from indexing admin.asp or update.asp, but forgetting to link to a different update.asp

The same process is done on the page, so robot can still reach update.asp through the second page of the < META > tag.

In addition, you can also use the value index. However, since they are ignored as defaults, this is not necessary and does not make sense.

If you are using IIS, you should always use a custom HTTP header file to execute the < META > Tagging method. Theoretically, a robot response to the < META > tag created in this way should be exactly the same, as it seems:

< META http-equiv= "robots" content= "Noindex" >

Theoretically, we can now use the custom headers created by IIS for all files or virtual paths in all folders or folders. But so far, the test of this method has not been successful. None of these methods can fully guarantee that your page will be hidden. Indeed, if someone intentionally writes a robot to find your private page, they become a pointer to the page the author wants to protect. However, for the purpose of preventing a commercial Site index page, these methods still work and only make sense in this regard.




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.