Parsing the escape function in PHP

Source: Internet
Author: User
Tags foreach chr join ord pack regular expression strlen

Using JS to the URL in the Chinese characters to escape code.
<a href= "onclick=" window.open (' product_list.php?p_sort= ' +escape (' Script House ')); " > Click the link after the effect:
Reference: http://127.0.0.1/shop/product_list.php?p_sort=PHP%u5F00%u53D1%u8D44%u6E90%u7F51
It is obvious that using PHP's UrlDecode () or Base64_decode () is not solvable.
workaround, write an inverse function in PHP:

Copy Code code as follows:


function Js_unescape ($str) {


$ret = ';


$len = strlen ($STR);


for ($i = 0; $i &lt; $len; $i + +)


{


if ($str [$i] = = '% ' &amp;&amp; $str [$i +1] = = ' u ')


{


$val = Hexdec (substr ($str, $i +2, 4));


if ($val &lt; 0x7f) $ret. = Chr ($val);


else if ($val &lt; 0x800) $ret. = Chr (0xc0| ( $val &gt;&gt;6)). Chr (0x80| ( $val &amp;0x3f)); else $ret. = Chr (0xe0| ( $val &gt;&gt;12)). Chr (0x80| ( ($val &gt;&gt;6) &amp;0x3f)). Chr (0x80| ( $val &amp;0x3f));


$i + 5;


}


else if ($str [$i] = = '% ')


{


$ret. = UrlDecode (substr ($str, $i, 3));


$i + 2;


}


else $ret. = $str [$i];


}


return $ret;}


Note that the JS code will be automatically converted into UTF-8, so must be coded conversion to get the correct results, otherwise it will be Chinese garbled. But if you use the UTF-8 code, you don't have to do this.
The code is as follows: Print iconv (' utf-8 ', ' gb2312 ', Js_unescape ($_request[' p_sort '));
Here we have successfully reversed the JS escape code.
As follows:
In addition, I found a php to implement JS escape encoding function:

Copy Code code as follows:


function Phpescape ($STR)


{


$sublen =strlen ($STR);


$retrunString = "";


for ($i =0; $i &lt; $sublen; $i + +)


{


if (Ord ($str [$i]) &gt;=127)


{


$tmpString =bin2hex (iconv ("gb2312", "Ucs-2", substr ($str, $i, 2));


//$tmpString =substr ($tmpString, 2,2). substr ($tmpString, 0,2); You may want to open this item under window


$retrunString. = "%u". $tmpString;


$i + +;


} else


{


$retrunString. = "%" Dechex (ord ($str [$i]));


}


}


return $retrunString;


}


Chinese is not supported in JSON, use it to transfer Chinese data will appear data loss or garbled, must be sent before the transmission of the string to encode, because the transmission in the past need to use JS data analysis, considering the JS has unescape function, so if there is an escape function in PHP, the data To encode, in the client to decode with unescape, so it will be more convenient.
First search on the internet, a lot of PHP to implement the escape function, the same as the following:

Copy Code code as follows:


function Phpescape ($str) {


Preg_match_all ("/[x80-xff].| [x01-x7f]+/", $str, $r);


$ar = $r [0];


foreach ($ar as $k =&gt; $v) {


if (ord ($v [0]) &lt; 128)


$ar [$k] = Rawurlencode ($v);


Else


$ar [$k] = "%u". Bin2Hex (Iconv ("GB2312", "UCS-2", $v));


}


return Join ("", $ar);


}


This function works very well, but perhaps some novice does not understand the principle of the function (such as me), it is always uneasy to use, and now I will explain the principle of this function. And I think it's like standing on the shoulders of a giant with someone else's code, but if you don't understand someone else's code, you'll fall to the ground sooner or later.
The first sentence:preg_match_all ("/[x80-xff].| [x01-x7f]+/, $str, $r); This is a regular expression that matches all the characters in the string, [X80-xff]. Matching is the Chinese character, X represents the matching character of the 16-encoded, [] is a class selector, "." represents any character, so [X80-xff]. The match is two characters, the first of which is the 16 character from 80 to FF, which happens to be the first character of the encoding. This will be a complete match of a Chinese character. On the code of Chinese characters in Unicode, we can search the Internet. Similarly, [x01-x7f]+ English string, because the earliest English is ASCII encoding, the encoding value is less than 128, that is, 16 binary from 01 to 7f, "+" represents one or more characters, so [x01-x7f]+ can match consecutive multiple English strings.

Copy Code code as follows:


$ar = $r [0]; $r [0] where the storage is matched to the array


foreach ($ar as $k =&gt; $v) {


if (ord ($v [0]) &lt; 128)//If the character encoding value is less than 128, the description is an English character


$ar [$k] = Rawurlencode ($v); Directly using Rawurlencode encoding


Else


$ar [$k] = "%u". Bin2Hex (Iconv ("GB2312", "UCS-2", $v)); Otherwise, use the Iconv function to convert Chinese characters into ucs-2 encoding, which is Unicode encoding


}


In JavaScript, you can use unescape to decode it.
U0391-uffe5 and U4e00-u9fa5 to match Chinese
But it seems that the former contains a-¥ under Chinese characters and the latter may be pure Chinese characters.
Where the decoding function is:

Copy Code code as follows:


function unescape ($str) {


$str = Rawurldecode ($STR);


Preg_match_all ("/%u.{4}|&amp; #x .{4};|&amp; #d +;|.+/u", $str, $r);


$ar = $r [0];


foreach ($ar as $k =&gt; $v) {


if (substr ($v, 0,2) = = "%u")


$ar [$k] = Iconv ("UCS-2", "GBK", Pack ("H4", substr ($v,-4));


ElseIf (substr ($v, 0,3) = = "&amp; #x")


$ar [$k] = Iconv ("UCS-2", "GBK", Pack ("H4", substr ($v, 3,-1));


ElseIf (substr ($v, 0,2) = = "&amp;#") {


$ar [$k] = Iconv ("UCS-2", "GBK", Pack ("n", substr ($v, 2,-1));


}


}


return Join ("", $ar);


}


One, encoding range
1. GBK (gb2312/gb18030)
x00-xff GBK double-byte encoding range
x20-x7f ASCII
xa1-xff Chinese
x80-xff Chinese
2. UTF-8 (Unicode)
u4e00-u9fa5 (Chinese)
x3130-x318f (Korean
xac00-xd7a3 (Korean)
U0800-u4e00 (Japanese)
PS: Korean is greater than [U9fa5] character
Regular example:
preg_replace ("/([X80-xff])/", ", $str);
Preg_replace ([u4e00-u9fa5 ]/"," ", $str);

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.