Asp. NET Tutorial: Removing duplicate content using. ashx files

Source: Internet
Author: User

Different links point to the page if there are a lot of the same content, this phenomenon will be called "duplicate content", if a site repeated a lot of content, the search engine will think that the value of the site is not high. So we should try to avoid all kinds of repetitive content.

The repetitive content of dynamic Web sites is often caused by URL parameters, and URL rewriting will worsen this phenomenon (more intriguing yo, hehe). Because if the use of the original URL parameters, the search engine may make appropriate judgments, and learned that the duplicate content is caused by the URL parameters, automatic processing, and URL rewriting mask URL parameters, instead of the search engine can not recognize the URL parameters. Like what:

The original URL:
Http://www.freeflying.com/articles.aspx?id=231&catelog=blog
Http://www.freeflying.com/articles.aspx?id=231&catelog=news

URL rewritten after URL:
Http://www.freeflying.com/blog/231.html
Http://www.freeflying.com/news/231.html

These URLs point to the page content is actually the same, are id=231 that article, but this article by the blog and news two columns, for various reasons of consideration, our final URL or as shown above.

There are two ways to deal with this, one is to "exclude" a Robot (robot) protocol, the other is to permanently redirect one URL to another by 301.

Today we'll talk about the robot agreement. Simply put, robot refers to the search engine, to Google, we call it "spider (spider)". Spiders are polite and will first ask for your advice before crawling the content of your Web page. And you communicated with robot based on the robot protocol. Specific to implementation, there are two ways:

1. Add a robots.txt text to the site root directory, such as:

#static content, forbid all the pages under the "Admin" folder
User-agent: *
Disallow:/admin

#行表示注释;

User-agent refers to search engines, * for all search engines, can also specify specific search engines, such as User-agent:googlebot;

Disallow specifies a directory or page that is not allowed to be accessed, note: 1. This text is case-sensitive; 2. You must start with "\" to represent the site root directory;

As with the purpose of this series, we focus on asp.net technology. So for more robots.txt text notes, check out the http://www.googlechinawebmaster.com/2008/03/robotstxt.html

But how do we generate this file dynamically (a lot of demand in fact)? Perhaps we immediately think of is I/O operation, in the root directory to write a txt file ..., but there is also a way to: use a generic handler (. ashx file), the code is as follows:

<%@ WebHandler language= "C #" class= "Handler"%>

Using System;
Using System.Web;

public class Handler:ihttphandler {

public void ProcessRequest (HttpContext context) {

HttpResponse response = context. Response;

Response. Clear ();

Response.  ContentType = "Text/plain"; If you want to use IE6 to view the page, this is not a statement, for unknown reasons

The following two sentences in the actual use of the database should be dynamically generated
Response. Write ("user-agent: * \ n");
Response. Write ("Disallow:/news/231.html \ n");

Refers to a static file containing content that will not change the contents of the screen
Response. WriteFile ("~/static-robots.txt");

Response. Flush ();
}

public bool IsReusable {
get {
return false;
}
}

}

The general handler implements IHttpHandler, and in the previous Urlrewrite section we talked about HttpModule, in fact, in the ASP.net application lifecycle, there is a concept called "piping (Pipeline)": An HTTP request, After a httpmodule of a "filter/process", eventually reached a Httphandle "processor" part, HttpModule and Httphandle formed a "pipeline", very image yo, hehe. Put a picture on it:



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.