Php Chinese url transcoding

Source: Internet
Author: User
Tags form post alphanumeric characters
Php Chinese url transcoding

   
For URL encoding in PHP, urlencode () or rawurlencode () can be used. The difference between the two is that the former encodes spaces into '+ ', the latter encodes spaces into '% 20', but it should be noted that only part of the URL should be encoded, otherwise the colon and backslash in the URL will be escaped. The following is a detailed explanation:

String urlencode (string str)

Returns a string -_. all other non-alphanumeric characters will be replaced with a semicolon (%) followed by two hexadecimal numbers, and spaces will be encoded as the plus sign (+ ). This encoding method is the same as that for WWW form POST data and the same as that for application/x-www-form-urlencoded. For historical reasons, this encoding is different from RFC1738 encoding (see rawurlencode () in space encoding as the plus sign (+. This function allows you to encode a string and use it in the request part of a URL. It also allows you to pass variables to the next page:

Example 1. urlencode () example

Echo '';
?>

Note: Be careful with the variables that match the HTML object. Image &,©And £4 will be parsed by the browser, and the expected variable name will be replaced by the actual object. This is obviously confusing. W3C has warned people for years. Reference address: http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2 PHP uses the arg_separator. ini command to change the parameter delimiter to the semicolon recommended by W3C. Unfortunately, most user proxies do not send form data in the semicolon separator format. A simple solution is to use & replace & as the separator. You do not need to modify the PHP arg_separator. Make it still &, and use only htmlentities (urlencode ($ data) to encode your URL.

Example 2. urlencode () and htmlentities () examples

Echo '';
?>

String urlencode (string str)

Returns a string. all non-alphanumeric characters except-_. in this string will be replaced with a semicolon (%) followed by two hexadecimal numbers. This encoding is described in RFC 1738 to protect the original characters from being interpreted as special URL delimiters and protect the URL format to prevent them from being transmitted to media (like some email systems) use character conversion. For example, if you want to include a password in an ftp url:

Example 1. rawurlencode () Example 1

Echo ''@ ftp.my.com/x.txt"> ';
?>

Or, if you want to pass the information through the PATH_INFO component of the URL:

Example 2. rawurlencode () Example 2

Echo 'rawurlencode ('sales and marketing/Miami '),' "> ';
?>

During decoding, you can use the corresponding urldecode () and rawurldecode (). correspondingly, rawurldecode () does not decode the plus sign ('+') as a space, while urldecode () can. The following is a detailed example:

String urldecode (string str)

Decodes any % # in the encoded string ##. Returns the decoded string.

Example 1. urldecode () example

$ A = explode ('&', $ QUERY_STRING );
$ I = 0;
While ($ I <count ($ )){
$ B = split ('=', $ a [$ I]);
Echo 'Value for parameter ', htmlspecialchars (urldecode ($ B [0]),
'Is, htmlspecialchars (urldecode ($ B [1]),"
\ N ";
$ I ++;
}
?>

String rawurldecode (string str)

Returns a string of hundreds of semicolons (%) followed by two hexadecimal numbers. all strings are replaced with original characters.

Example 1. rawurldecode () example


Echo rawurldecode ('foo % 20bar % 40baz'); // foo bar @ baz

?>

However, one thing to note is that the strings decoded by urldecode () and rawurldecode () are encoded in the UTF-8 format, if the URL contains Chinese characters, and the page settings is not the UTF-8, the decoded string to convert, in order to display normally!

Another problem is that the obtained URL is not % nn n = {0 .. f} format, but % unnnn n = {0 .. f} format. at this time, using urldecode () and rawurldecode () cannot be decoded correctly, but the following function can be used to decode it correctly:

Function utf8RawUrlDecode ($ source)
{
$ DecodedStr = "";
$ Pos = 0;
$ Len = strlen ($ source );
While ($ pos <$ len ){
$ CharAt = substr ($ source, $ pos, 1 );
If ($ charAt = '% '){
$ Pos ++;
$ CharAt = substr ($ source, $ pos, 1 );
If ($ charAt = 'u '){
// We got a unicode character
$ Pos ++;
$ UnicodeHexVal = substr ($ source, $ pos, 4 );
$ Unicode = hexdec ($ unicodeHexVal );
$ Entity = "& #". $ unicode .';';
$ DecodedStr. = utf8_encode ($ entity );
$ Pos + = 4;
}
Else {
// We have an escaped ascii character
$ HexVal = substr ($ source, $ pos, 2 );
$ DecodedStr. = chr (hexdec ($ hexVal ));
$ Pos + = 2;
}
} Else {
$ DecodedStr. = $ charAt;
$ Pos ++;
}
}
Return $ decodedStr;
}
  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.