URL normalization-seo Basics

Source: Internet
Author: User
Tags intl subdomain name


URL Normalization)
It is actually a process of standardizing URLs, that is, converting a URL into an equivalent URL that complies with the specifications (for example, converting http://www.cnblogs.com/shuchaoto http://www.cnblogs.com/shuchao/).ProgramIt can be determined that these two URLs are equivalent.

URL normalization is used by the search engine to reduce duplicate indexing of pages, while also reducing repeated crawling of crawlers. The browser also needs to use URL normalization to identify whether a user has accessed a URL.

    • 1 URL Composition
    • 2 Nonstandard URL
    • 3. url standardization process
    • 4 Seo URL Standardization


URL composition:


Protocol: // hostname [: Port]/path/[; parameters] [? Query] # Fragment

Protocol: // host name [: Port]/path/[: parameter] [? Query] # Fragment



Nonstandard URL:


1. Extra characters in the URL

1.1 The URL of a subdomain name contains "www": "http :// Www. Shuchao.cnblogs.com /"

1.2 contains default port: "http://www.cnblogs.com : 80 /Shuchao /"

1.3 loose URL: "http://www.chapters.indigo.ca/books/ Amazon-Sucks-donkey-bils /9780470170779 -Item . Html"

More than 1.4 residual file name index.html, default. aspx and so on: "http://www.cnblogs.com/shuchao/ Index.html"

1.5 File Path

(1) redundant "/": "http://www.cnblogs.com/shuchao/ / "

(2) Extra vertex modifier string: "x/y/z/ Http://www.cnblogs.com/ A/B/ Http://www.cnblogs.com /../ Page.html"

1.6 redundant query strings

(1 )? (Empty query string): http://www.cnblogs.com/shuchao ?

(2 )&

(3) useless query variable: http://www.example.com/display? Id = 123 & Fake = fake

2. the URL lacks a string.

2.1 missing "/": "http://www.cnblogs.com/shuchao"

2.2 query string missing name or value: "http://www.example.com/display? Id = "or" http://www.example.com/display? = 123"

3. Other nonstandard URLs

3.1 "http://shuchao.cnblogs.com/" and "http://www.cnblogs.com/shuchao/" are actually the same content

3.2 use IP address instead of domain name

3.3 contains extended characters, case sensitive ("http://www.google.cn/Intl/zh-CN/about.html" and "http://www.google.cn/intl/zh-CN/about.html ")

Mix 3.4 "+" and "% 20"

3.5 query variable Order disorder: "http://www.example.com/test.aspx? Bar = 1 & A = test"

3.6 contains temporary state variables: http://www.example.com/test? Back =/prevpage. aspx



URL standardization process:


1. lowercase URL protocol name and Host Name

Http: // www.example.com/test-> http://www.example.com/test

2. The escape sequence is converted to uppercase because the size of the escape sequence is sensitive.

% 3A-> % 3A

3. Delete fragment (#)

Http://www.example.com/test/index.html#seo> http://www.example.com/test/index.html

4. Delete '? '

Http://www.example.com/test? -> Http://www.example.com/test

5. Delete the default suffix

Http://www.example.com/test/index.html> http://www.example.com/test/

6. Delete unnecessary vertices.

Http://www.example.com/../a/ B /../c/./d.html> http://www.example.com/a/c/d.html

7. Delete unnecessary "www"

Http://www.test.example.com/> http://test.example.com/

8. Sort query Variables

Http://www.example.com/test? Id = 123 & fakefoo = fakebar → http://www.example.com/test? Id = 123 \

9. Delete the variable with the default value.

Http://www.example.com/test? Id = & sort = ascending → http://www.example.com/test

10. Delete unnecessary query strings, Such ?, &

Http://www.example.com/test? → Http://www.example.com/test

11. Dust rules (Heuristic method proposed by schonfeld and others)

Http://www.example.com/test? Id = 123-> http://www.example.com/test_123



Seo URL standardization:

Non-standard URLs may cause many duplicate URLs on the website. As a result, crawlers repeatedly crawl the same content, affecting the effective content of the website and indexing.

Multiple non-standard URLs cause sparse PR, which is originally directed to the PR of the Same page. As a result, multiple non-standard URLs are routed.

There is also a user experience problem. Complicated or nonstandard URLs can easily make users feel bad about the website.


The Google Administrator added a URL normalization tool to delete useless parameters in the URL.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.