Differences between urlencode and rawurlencode in php

Source: Internet
Author: User
Tags form post pack urlencode alphanumeric characters

Some time ago, I encountered a "URL plus cause error" BUG. The cause of this bug is that I used the urlencode function in the URL, which converts spaces to the plus signs, in this case, the URL parsing error occurs, and the space can be parsed only after being converted to % 20. In this case, we need to use the rawurlencode function. The following describes the differences between urlencode and rawurlencode:

Urlencode function:

Returns a string -_. all other non-alphanumeric characters will be replaced with a semicolon (%) followed by two hexadecimal numbers, and spaces will be encoded as the plus sign (+ ). This encoding method is the same as that for WWW form POST data and the same as that for application/x-www-form-urlencoded. For historical reasons, this encoding is different from RFC1738 encoding (see rawurlencode () in space encoding as the plus sign (+.

Rawurlencode function:

Returns a string. All non-alphanumeric characters except-_. In this string will be replaced with a semicolon (%) followed by two hexadecimal numbers. This encoding is described in & raquo; RFC 3986 to protect the original characters from being interpreted as special URL delimiters, at the same time, the URL format is protected to prevent the transmitted media (such as some mail systems) from interfering with character conversion. The following is an example:

The code is as follows: Copy code

<? Php

$ String = "hello world ";

Echo urlencode ($ string). '<br/>'; // output: hello + world
Echo rawurldecode ($ string). '<br/>'; // output: hello % 20 world

?>

Comparison of specific examples:

The code is as follows: Copy code

<? Php
For ($ I = 0x20; $ I <0x7f; $ I ++ ){
$ Str. = dechex ($ I );
}

$ Asscii = pack ("H *", $ str );
Echo "all printable asscii characters: (from space ~) N ". $ asscii." \ n ";
Echo "urlencode result: \ n". urlencode ($ asscii );
Echo "\ n ";
Echo "urlencode characters not encoded: \ n". preg_replace ("/%. {2}/", "", urlencode ($ asscii ));
Echo "\ n ";
Echo "rawurlencode result: \ n". rawurlencode ($ asscii );
Echo "\ n ";
Echo "rawurlencode: \ n". preg_replace ("/%. {2}/", "", rawurlencode ($ asscii ));
Echo "\ n ";

Exit;
?>

Output result:
---------------------------
All printable asscii characters: (from space ~)
! "# $ % & '() * +,-./0123456789:; <=>? @ ABCDEFGHIJKLMNOPQRSTUVWXYZ [\] ^ _ abcdefghijklmnopqrstuvwxyz {| }~
Urlencode result:
+ % 21% 22% 23% 24% 25% 26% 27% 28% 2A % 2B % 2C -. % 2F0123456789% 3A % 3B % 3C % 3D % 3E % 3F % 40 ABCDEFGHIJKLMNOPQRSTUVWXYZ % 5B % 5C % 5D % 5E _ % 60 bytes % 7B % 7C % 7D % 7E
Characters not encoded by urlencode:
+-. 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
Rawurlencode result:
% 20% 21% 22% 23% 24% 25% 26% 27% 28% 2A % 2B % 2C -. % 2F0123456789% 3A % 3B % 3C % 3D % 3E % 3F % 40 ABCDEFGHIJKLMNOPQRSTUVWXYZ % 5B % 5C % 5D % 5E _ % 60 bytes % 7B % 7C % 7D % 7E
Rawurlencode:
-. 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz

---------------------------------------------------------------------------------
Compare the two results:
1. Numbers and uppercase/lowercase letters are not encoded.
2. The minus sign, DOT number, and underline are not encoded.
3. rawurlencode encodes one "plus sign" more than urlencode.

Differences between escape and encodeURIComponent in JavaScript:

The code is as follows: Copy code

>>> Console. log (encodeURIComponent ("unified registration 1 "));

% E7 % BB % 9F % E4 % B8 % 80% E6 % B3 % A8 % E5 % 86% 8C1
>>> Console. log (escape ("unified registration 1 "));
% U7EDF % u4E00 % u6CE8 % u518C1

<? Php
Echo iconv ("UTF-8", "gbk", urldecode ("% E7 % BB % 9F % E4 % B8 % 80% E6 % B3 % A8 % E5 % 86% 8C1 "));
Echo "\ n ";
Echo urldecode ("% u7EDF % u4E00 % u6CE8 % u518C1 ");
// Use the following Scape.
// Echo iconv ("UTF-8", "gbk", unescape ("% u7EDF % u4E00 % u6CE8 % u518C1 ");
Exit;
?>

Output result:
==============================================
Unified registration 1
% U7EDF % u4E00 % u6CE8 % u518C1
==============================================

Result description:
1. encodeURIComponent always converts the input to utf8 encoding, encoded by byte
2. escape is encoded according to unicode because it also encodes insecure characters in the url, so it can also be used for encoding in the url. However, the server will not automatically decode it, the following provides a PHP decoding function, which is found in the manual:

<? Php

The code is as follows: Copy code

Function unescape ($ str ){
$ Str = rawurldecode ($ str );
Preg_match_all ("/(? : % U. {4}) | & # x. {4}; | & # d +; |. +/U ", $ str, $ r );
$ Ar = $ r [0];
Foreach ($ ar as $ k => $ v ){
If (substr ($ v, 0, 2) = "% u ")
$ Ar [$ k] = iconv ("UCS-2", "UTF-8", pack ("H4", substr ($ v,-4 )));
Elseif (substr ($ v, 0, 3) = "& # x ")
$ Ar [$ k] = iconv ("UCS-2", "UTF-8", pack ("H4", substr ($ v, 3,-1 )));
Elseif (substr ($ v, 0, 2) = "&#"){
$ Ar [$ k] = iconv ("UCS-2", "UTF-8", pack ("n", substr ($ v, 2,-1 )));
        }
    }
Return join ("", $ ar );
}

?>

 

>>> Console. log (escape ("! \ "# $ % & '() * +,-./0123456789:; =>? @ ABCDEFGHIJKLMNOPQRSTUVWXYZ [\] ^ _ abcdefghijklmnopqrstuvwxyz {| }~ "));
% 20% 21% 22% 23% 24% 25% 26% 27% 28% 29 * + % 2C -. /0123456789% 3A % 3B % 3C % 3D % 3E % 3F @ ABCDEFGHIJKLMNOPQRSTUVWXYZ % 5B % 5D % 5E _ % 60 abcdefghijklmnopqrstuvwxyz % 7B % 7C % 7D % 7E
>>> Console. log (encodeURIComponent ("! \ "# $ % & '() * +,-./0123456789:; <=>? @ ABCDEFGHIJKLMNOPQRSTUVWXYZ [\] ^ _ abcdefghijklmnopqrstuvwxyz {| }~ "));
% 20! % 22% 23% 24% 25% 26' () * % 2B % 2C -. % 2F0123456789% 3A % 3B % 3C % 3D % 3E % 3F % 40 ABCDEFGHIJKLMNOPQRSTUVWXYZ % 5B % 5D % 5E _ % 60 abcdefghijklmnopqrstuvwxyz % 7B % 7C % 7D ~
>>> Console. log (escape ("! \ "# $ % & '() * +,-./0123456789:; <=>? @ ABCDEFGHIJKLMNOPQRSTUVWXYZ [\] ^ _ abcdefghijklmnopqrstuvwxyz {| }~ "). Replace (/%. {2}/g ,""));

* +-./0123456789 @ ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
>>> Console. log (encodeURIComponent ("! \ "# $ % & '() * +,-./0123456789:; <=>? @ ABCDEFGHIJKLMNOPQRSTUVWXYZ [\] ^ _ 'abcdefghijklmnopqrstuvwxyz {| }~ "). Replace (/%. {2}/g ,""));
! '() *-. 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz ~

Result comparison:
Unencoded characters in escape: * +-./@ _, a total of 7 characters
Unencoded characters of encodeURIComponent :! '()*-._~ A total of 9

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.