It is found that almost all Web sites now have UrlEncode operations on Chinese characters and special characters in URLs, namely:
http://hi.baidu.com/%BE%B2%D0%C4%C0%CF%C8%CB/creat/blog/
This way, in the middle of the form, it is definitely my login user name.
Why do you encode these characters in the form of a character encoding (GBK, UTF8) or in order to not have special characters in the URL? All know to turn, but the real benefit of the turn. Viewed a lot of information on the Internet, and did not find a more accurate statement.
URL escaping is only intended to conform to URL specifications. Because Chinese and many characters in the standard URL specification are not allowed to appear in the URL.
Take a look at the instructions for PHP UrlEncode.
urlencode-encoded URL string
String
UrlEncode(String$str)
Returns a string in which all non-alphanumeric characters except -_ are replaced with a percent sign (%) followed by a two-digit hexadecimal number, and a space is encoded as a plus (+). This encoding is the same as the WWW form POST data, and is encoded in the same way as the application/x-www-form-urlencoded media type. For historical reasons, this encoding differs from RFC1738 encoding (see Rawurlencode ()) in terms of encoding spaces as plus signs (+). This function makes it easy to encode a string and use it for the request part of the URL, and it also facilitates the passing of a variable to the next page.
The Standard English description is:
"... Only alphanumerics [0-9a-za-z], the special characters "$-_.+!*" (), "
[not including the quotes-ed], and Reser Ved characters used for their reserved purposes could be used unencoded within a URL. "
What are the characters that need to be converted?
1. ASCII control characters
These characters are non-printable and naturally need to be converted.
2. Some non-ASCII characters
These characters are naturally illegal in the range of characters. Transformation is also a matter of course.
3. Some reserved characters
Obviously the most common is the "&", if it appears in the URL, you think it is a URL of a character, or a special parameter segmentation?
4. It's just some unsafe characters.
For example: spaces. To prevent ambiguity, you need to be converted to "+".
Knowing this, you know why you need to convert, and the rules of transformation are simple.
According to the character encoding of each character, not in line with our range, all the conversion to% of the form is also. Nature is also 16 in the form of the binary.
Independent of character encoding
Through UrlEncode's transformation rules and purposes, it is also easy to see that Urleocode is based on character encoding. The same kanji, different encoding types, certainly correspond to the strings of different urleocode. GBK encoded with GBK encode results.
Apache and other servers, after accepting the string, can be decode, but still cannot solve the problem of encoding. Coding problems, or the need to rely on conventions or character coding to solve the judgment.
Therefore, Urleocode is only for some non-ASCII characters in the URL, it can be transmitted correctly, as to which encoding to use, it is not eocode concern and solve the problem.
Coding problem, not urlencode to solve.
Transferred from: http://apps.hi.baidu.com/share/detail/32230450
Resources:
Http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
http://cn.php.net/manual/zh/function.urlencode.php
Why do I need to encode the URL