Reprint: Why to encode a URI

Source: Internet
Author: User
Tags control characters printable characters

Why URL encoding is required, usually if something needs to be coded, it means that something is not suitable for transmission. There are a variety of reasons, such as size too large to contain private data, and for URLs, the reason for encoding is because some characters in the URL cause ambiguity. For example, the URL parameter string uses the Key=value key value pair in such a way to pass the parameter, and the key-value pairs are separated by A & symbol, such as/s?q=abc&ie=utf-8. If your value string contains = or &amp, then it is bound to cause the server parsing error to receive the URL, so the ambiguous & and = symbol must be escaped, that is, encoded. Another example is that the URL is encoded in ASCII instead of Unicode, which means that you cannot include any non-ASCII characters in the URL, such as Chinese. Otherwise, Chinese can cause problems if the client browser and the server-side browser support different character sets.

The principle of URL encoding is to use safe characters (printable characters with no special purpose or special meaning) to represent unsafe characters. Which characters need to be encoded RFC3986 documents stipulate that only the English letters (A-ZA-Z), Numbers (0-9),-_.~4 special characters, and all reserved characters are allowed in the URL. The RFC3986 document makes a detailed recommendation on the encoding and decoding of URLs, indicating which characters need to be encoded to not cause a change in URL semantics, and explain why these characters need to be encoded. The us-ascii character set does not have a corresponding printable character in the URL that only allows printable characters to be used. The 10-7f bytes in the US-ASCII code all represent control characters that do not appear directly in the URL. Also, for 80-ff bytes (iso-8859-1), the byte range defined by the US-ACII has been exceeded and therefore cannot be placed in the URL.

Reserved character URLs can be divided into several components, protocols, hosts, paths, and so on. There are some characters (:/?#[]@) that are used to separate different components. For example: colons are used to separate protocols and hosts,/for separating hosts and paths, for separating paths and query parameters, and so on. There are also characters (!$& ' () *+,;=) that are used to delimit each component, such as = used to represent key-value pairs in query parameters,& symbols are used to separate queries for multiple key-value pairs. When normal data in a component contains these special characters, it needs to be encoded.

The following characters are reserved characters in RFC3986:! * ‘ ( ) ; : @ & = + $,/? # [] Unsafe characters have some characters that can cause ambiguity in the parser when they are placed directly in the URL. These characters are considered unsafe characters for a number of reasons. The space URL in the process of transmission, or the user in the process of typesetting, or text handlers in the process of processing URLs, it is possible to introduce insignificant spaces, or the meaningful spaces to remove the quotation marks and the <> quotation marks and angle brackets are usually used in ordinary text to separate the role of the URL # Typically used to represent a bookmark or an anchor% percent sign itself used as a special character to encode unsafe characters, so it needs to encode {}|\^[] ' ~ Some gateways or transport agents will tamper with these characters

It is important to note that for legitimate characters in URLs, encoding and non-coding are equivalent, but for the above mentioned characters, they may cause different URL semantics if they are not encoded. Therefore, for URLs, only ordinary English characters and numbers, special character $-_.+!* ' () and reserved characters, can appear in the URL without encoding. All other characters need to be encoded before they appear in the URL.

Reprint: Why to encode a URI

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.