Differences between urlencode and rawurlencode in php: phprawurlencode
Some time ago, I encountered a "URL plus cause error" BUG. The cause of this bug is that I used the urlencode function in the URL, which converts spaces to the plus signs, in this case, the URL parsing error occurs, and the space can be parsed only after being converted to % 20. In this case, we need to use the rawurlencode function.
The following describes the differences between urlencode and rawurlencode:
Urlencode function:
Returns a string -_. all other non-alphanumeric characters will be replaced with a semicolon (%) followed by two hexadecimal numbers, and spaces will be encoded as the plus sign (+ ). This encoding method is the same as that for WWW form POST data and the same as that for application/x-www-form-urlencoded. For historical reasons, this encoding is different from RFC1738 encoding (see rawurlencode () in Space Encoding As the plus sign (+.
Rawurlencode function:
Returns a string. All non-alphanumeric characters except-_. In this string will be replaced with a semicolon (%) followed by two hexadecimal numbers. This is the encoding described in RFC 3986 to protect the original characters from being interpreted as special URL delimiters, at the same time, the URL format is protected to prevent the transmitted media (such as some mail systems) from interfering with character conversion. The following is an example:
<? Php $ string = "hello world"; echo urlencode ($ string ). '<br/>'; // output: hello + worldecho rawurldecode ($ string ). '<br/>'; // output: hello % 20 world?>
Comparison of specific examples:
<? Phpfor ($ I = 0x20; $ I <0x7f; $ I ++) {$ str. = dechex ($ I) ;}$ asscii = pack ("H *", $ str); echo "All printable asscii characters: (from space ~) N ". $ asscii. "\ n"; echo "urlencode result: \ n ". urlencode ($ asscii); echo "\ n"; echo "urlencode is not encoded in the words http://www.bkjia.com/fu :n n ". preg_replace ("/%. {2}/"," ", urlencode ($ asscii); echo" \ n "; echo" rawurlencode result: \ n ". rawurlencode ($ asscii); echo "\ n"; echo "rawurlencode characters not encoded: \ n ". preg_replace ("/%. {2}/"," ", rawurlencode ($ asscii); echo" \ n "; exit;?> Output result: --------------------------- all printable asscii characters: (from space ~)! "# $ % & '() * +,-./0123456789:; <=>? @ ABCDEFGHIJKLMNOPQRSTUVWXYZ [\] ^ _ abcdefghijklmnopqrstuvwxyz {| }~ Urlencode result: + % 21% 22% 23% 24% 25% 26% 27% 28% 29% 2A % 2B % 2C -. % 2F0123456789% 3A % 3B % 3C % 3D % 3E % 3F % 40 ABCDEFGHIJKLMNOPQRSTUVWXYZ % 5B % 5C % 5D % 5E _ % 60 bytes % 7B % 7C % 7D % 7 Eurlencode is not encoded character: + -. 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzrawurlencode: % 20% 21% 22% 23% 24% 25% 26% 27% 2A % 2B % 2C -. % 2F0123456789% 3A % 3B % 3C % 3D % 3E % 3F % 40 ABCDEFGHIJKLMNOPQRSTUVWXYZ % 5B % 5C % 5D % 5E _ % 60 bytes % 7B % 7C % 7D % 7 Erawurlencode is not encoded character: -. 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
Compare the two results:
1. Numbers and uppercase/lowercase letters are not encoded.
2. The minus sign, DoT number, and underline are not encoded.
3. rawurlencode encodes one "plus sign" more than urlencode.
Differences between escape and encodeURIComponent in JavaScript:
>>> Console. log (encodeURIComponent ("Unified Registration 1"); % E7 % BB % 9F % E4 % B8 % 80% E6 % B3 % A8 % E5 % 86% 8C1> console. log (escape ("Unified Registration 1"); % u7EDF % u4E00 % u6CE8 % u518C1 <? Phpecho iconv ("UTF-8", "gbk", urldecode ("% E7 % BB % 9F % E4 % B8 % 80% E6 % B3 % A8 % E5 % 86% 8C1 ")); echo "\ n"; echo urldecode ("% u7EDF % u4E00 % u6CE8 % u518C1"); // You can // echo iconv ("UTF-8 ", "gbk", unescape ("% u7EDF % u4E00 % u6CE8 % u518C1"); exit;?> Output result: ============================================== Unified Registration 1% u7EDF % u4E00 % u6CE8 % u518C1 ========================== ======
Result description:
1. encodeURIComponent always converts the input to utf8 encoding, encoded by byte
2. escape is encoded according to unicode because it also encodes insecure characters in the url, so it can also be used for encoding in the url. However, the server will not automatically decode it, the following provides a PHP decoding function, which is found in the manual:
<?phpfunction unescape($str) { $str = rawurldecode($str); preg_match_all("/(?:%u.{4})|&#x.{4};|&#d+;|.+/U",$str,$r); $ar = $r[0]; foreach($ar as $k=>$v) { if(substr($v,0,2) == "%u") $ar[$k] = iconv("UCS-2","UTF-8",pack("H4",substr($v,-4))); elseif(substr($v,0,3) == "&#x") $ar[$k] = iconv("UCS-2","UTF-8",pack("H4",substr($v,3,-1))); elseif(substr($v,0,2) == "&#") { $ar[$k] = iconv("UCS-2","UTF-8",pack("n",substr($v,2,-1))); } } return join("",$ar); }?> >>> console.log(escape(" !\"#$%&'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz{|}~"));%20%21%22%23%24%25%26%27%28%29*+%2C-./0123456789%3A%3B%3C%3D%3E%3F@ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E>>> console.log(encodeURIComponent("!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz{|}~"));%20!%22%23%24%25%26'()*%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~>>> console.log(escape("!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz{|}~").replace(/%.{2}/g,""));*+-./0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz>>> console.log(encodeURIComponent("!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~").replace(/%.{2}/g,""));!'()*-.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~
Result comparison:
Unencoded characters in escape: * +-./@ _, a total of 7 characters
Unencoded characters of encodeURIComponent :! '()*-._~ A total of 9
The difference between urlencode and rawurlencode in php is the whole content shared by xiaobian. I hope you can give us a reference and support for our guests.