Some time ago said that they encountered a "URL plus throw error" bug, the cause of this bug is that they use the UrlEncode function in the URL, the function will convert the space into a plus sign, which will cause the URL parsing error, and the space only converted to%20 can be normal parsing, Then we need to use the Rawurlencode function. Here's a look at the difference between the UrlEncode function and the Rawurlencode function:
UrlEncode function:
Returns a string, in addition to-_, in this string. All non-alphanumeric characters are replaced with a percent sign (%) followed by a two-digit hexadecimal number, and a space is encoded as a plus (+). This encoding is the same as the WWW form POST data, and is encoded in the same way as the application/x-www-form-urlencoded media type. For historical reasons, this encoding differs from RFC1738 encoding (see Rawurlencode ()) in terms of encoding spaces as plus signs (+).
Rawurlencode function:
Returns a string, in addition to-_, in this string. All non-alphanumeric characters are replaced with a percent (%) followed by a two-digit hexadecimal number. This is the encoding described in» RFC 3986 to protect the literal characters from being interpreted as a special URL delimiter, while protecting the URL format from being confused by the transfer media (like some messaging systems) using character conversions. Let's take a look at the example below:
The code is as follows |
Copy Code |
<?php $string = "Hello World"; echo UrlEncode ($string). ' <br/> '; Output: Hello+world echo Rawurldecode ($string). ' <br/> ';//output: Hello%20world ?> |
Comparison of specific examples:
The code is as follows |
Copy Code |
<?php for ($i = 0x20; $i < 0x7f; $i + +) { $str. = Dechex ($i); } $asscii = Pack ("h*", $str); echo "All printable Asscii characters: (from space to ~) n". $asscii. " \ n "; echo "UrlEncode result: \ n". UrlEncode ($ASSCII); echo "\ n"; echo "UrlEncode does not encode the character http://www.111cn.net/: \ n". Preg_replace ("/%.{ 2}/"," ", UrlEncode ($ASSCII)); echo "\ n"; echo "Rawurlencode result: \ n". Rawurlencode ($ASSCII); echo "\ n"; echo "Rawurlencode does not encode characters: \ n". Preg_replace ("/%.{ 2}/"," ", Rawurlencode ($ASSCII)); echo "\ n"; Exit ?> Output Result: ——————————————————————————— All printable Asscii characters: (from space to ~) !" #$%& ' () *+,-./0123456789:;<=>[email protected][\]^_abcdefghijklmnopqrstuvwxyz{|} ~ Results of UrlEncode: +%21%22%23%24%25%26%27%28%29%2a%2b%2c-.%2f0123456789%3a%3b%3c%3d%3e%3f%40abcdefghijklmnopqrstuvwxyz%5b%5c%5d% 5e_%60abcdefghijklmnopqrstuvwxyz%7b%7c%7d%7e UrlEncode do not encode the characters: +-.0123456789abcdefghijklmnopqrstuvwxyz_abcdefghijklmnopqrstuvwxyz Results of Rawurlencode: %20%21%22%23%24%25%26%27%28%29%2a%2b%2c-.%2f0123456789%3a%3b%3c%3d%3e%3f%40abcdefghijklmnopqrstuvwxyz%5b%5c%5d %5e_%60abcdefghijklmnopqrstuvwxyz%7b%7c%7d%7e Rawurlencode do not encode the characters: -.0123456789abcdefghijklmnopqrstuvwxyz_abcdefghijklmnopqrstuvwxyz |
---------------------------------------------------------------------------------
Compare the results of both:
1. Numbers, uppercase and lowercase letters are not encoded
2. Minus, Dot, underline three do not encode
3. Rawurlencode more code than urlencode a "plus"
About the difference between escape and encodeuricomponent in javascript:
copy code |
tr>
>> > Console.log (encodeURIComponent ("Unified Registration 1")); %e7%bb%9f%e4%b8%80%e6%b3%a8%e5%86%8c1 >>> console.log (Escape ("Unified registration 1"); %U7EDF%U4E00%U6CE8%U518C1 <?php Echo iconv ("Utf-8", "GBK", UrlDecode ("%e7%bb%9f%e4%b8%80%e6%b3%a8%e5% 86%8c1 ")); echo "\ n"; Echo UrlDecode ("%u7edf%u4e00%u6ce8%u518c1"); //Use the unescape below to //echo iconv ("Utf-8", "GBK", Unescape ("%u7edf%u4e00%u6ce8%u518c1"); Exit; ? Output: ====================================== Unified Registration 1 %U7EDF%U4E00%U6CE8%U518C1 ================== ==================== |
Result Description:
1. encodeURIComponent always converts input into UTF8 encoding, in bytes
2. Escape is processed in Unicode encoding because it also encodes the unsafe characters in the URL, so it can be encoded in the URL, but the server does not automatically decode, and a PHP version of the decoding function is provided below, which is found in the manual:
<?php
The code is as follows |
Copy Code |
function Unescape ($STR) { $str = Rawurldecode ($STR); Preg_match_all ("/(?:%u.{4}) |& #x .{4};|& #d +;|.+/u", $str, $r); $ar = $r [0]; foreach ($ar as $k = = $v) { if (substr ($v, 0,2) = = "%u") $ar [$k] = Iconv ("UCS-2", "UTF-8", Pack ("H4", substr ($v,-4))); ElseIf (substr ($v, 0,3) = = "& #x") $ar [$k] = Iconv ("UCS-2", "UTF-8", Pack ("H4", substr ($v, 3,-1))); ElseIf (substr ($v, 0,2) = = "the") { $ar [$k] = Iconv ("UCS-2", "UTF-8", Pack ("n", substr ($v, 2,-1))); } } return join ("", $ar); } ?> >>> Console.log (Escape ("!\" #$%& ' () *+,-./0123456789:;=>[email protected][\]^_ abcdefghijklmnopqrstuvwxyz{|} ~")); %20%21%22%23%24%25%26%27%28%29*+%2c-./0123456789%3a%3b%3c%3d%3e%[email protected]%5b%5d%5e_% 60abcdefghijklmnopqrstuvwxyz%7b%7c%7d%7e >>> Console.log (encodeURIComponent ("!\" #$%& ' () *+,-./ 0123456789:;<=>[email protected][\]^_abcdefghijklmnopqrstuvwxyz{|} ~")); %20!%22%23%24%25%26 ' () *%2b%2c-.%2f0123456789%3a%3b%3c%3d%3e%3f%40abcdefghijklmnopqrstuvwxyz%5b%5d%5e_% 60abcdefghijklmnopqrstuvwxyz%7b%7c%7d~ >>> Console.log (Escape ("!\" #$%& ' () *+,-./0123456789:;< =>[email protected][\]^_abcdefghijklmnopqrstuvwxyz{|} ~ "). Replace (/%.{ 2}/g, "")); *+-./[email protected]_abcdefghijklmnopqrstuvwxyz >>> Console.log (encodeURIComponent ("!\" #$%& ' () *+,-./0123456789:;<=>[email protected][\]^_ ' abcdefghijklmnopqrstuvwxyz{|} ~ "). Replace (/%.{ 2}/g, "")); !’ () *-.0123456789abcdefghijklmnopqrstuvwxyz_abcdefghijklmnopqrstuvwxyz~ |
Results comparison:
Escape non-encoded characters: *+-./@_ a total of 7
encodeURIComponent non-encoded characters:! ' () *-._~ a total of 9
From: http://www.111cn.net/phper/php-cy/58640.htm
The difference between UrlEncode and Rawurlencode in PHP