function unescape ( $str ) { $str = Rawurldecode ( $str );
Preg_match_all ("/(?:%u. { 4}) |& #x. {4};|&#\d+;|.+/u ",$str,$r);$ar=$r[0];//print_r ($ar);foreach($arAs$k=$v) {if(substr($v,0,2) =="%u"){$ar[$k] = Iconv ("Ucs-2be","UTF-8",Pack("H4",substr($v,-4))); } elseif (substr($v,0,3) =="& #x"){$ar[$k] = Iconv ("Ucs-2be","UTF-8",Pack("H4",substr($v,3,-1))); } elseif (substr($v,0,2) ==" A") {$ar[$k] = Iconv ("Ucs-2be","UTF-8",Pack("n",substr($v,2,-1))); } }returnJoin("",$ar);} Echo unescape ("Purple star Blue");
Today there is user feedback, form system user submitted data Chinese will be garbled. The test found a problem with the iconv conversion.
Iconv (' UCS-2 ', ' GBK ', ' Chinese ')
Google search found that the UCS-2 encoding on a Linux server was inconsistent with WINODWS.
So, I changed to Iconv (' ucs-2be ', ' GBK ', ' Chinese ') Try it, Chinese is normal
The following are the unspoken rules for the UCS-2 encoding of two platforms:
1, UCS-2 is not equal to UTF-16. UTF-16 each byte is encoded with an ASCII character range, while UCS-2 encodes each byte beyond the ASCII character range. UCS-2 and UTF-16 account for up to two bytes per character, but their encoding is not the same.
2, for UCS-2, the default is Ucs-2le under Windows. The Unicode of Ucs-2le is generated with MultiByteToWideChar (or a2w). Windows Notepad can save text as Ucs-2be, which is equivalent to a layer conversion.
3, for UCS-2, the default is Ucs-2be under Linux. Use Iconv (Specify UCS-2) to convert the generated Unicode to UCS-2BE. If you convert the Windows platform over the UCS-2, you need to specify Ucs-2le.
4, the understanding of UCS-2 is different for many platforms such as Windows and Linux (UCS-2LE,UCS-2BE). MS advocates that Unicode has a boot flag (Ucs-2le FFFE, Ucs-2be FEFF) to indicate that the following characters are Unicode and discriminate Big-endian or Little-endian. So the data coming from the Windows platform is found to have this prefix, don't panic.
5, Linux encoded output, such as from the output of the file, from the printf output, need the console to do the appropriate encoding matching (if the encoding mismatch, general and the program compile-time encoding has a number of relationships), and the console conversion input needs to view the current system encoding. For example, the current encoding of the console is UTF-8, then UTF-8 encoded things can be displayed correctly, GBK can not, similarly, the current code is GBK, you can display GBK encoding, and later the system should be more intelligent to handle more conversion. However, through the putty and other terminals still need to set a good terminal encoding conversion to remove garbled trouble.
Implementation of Unicode encoding and decoding for Chinese characters in PHP
//Unicode encoding of content functionunicode_encode($name) {$name= Iconv (' UTF-8 ',' UCS-2 ',$name);$len= Strlen ($name);$str=''; for($i=0;$i<$len-1;$i=$i+2) {$c=$name[$i];$c 2=$name[$i+1];if(Ord ($c) >0) {//Two bytes of text$str.=' \u '. Base_convert (Ord ($c),Ten, -). Base_convert (Ord ($c 2),Ten, -); }Else{$str.=$c 2; } }return$str;}$name=' MY, your uncle's ';$unicode _name=unicode_encode ($name);Echo''. $unicode _name. '
';//Decode Unicode encoded content functionunicode_decode($name) {//Convert encoding to convert Unicode encoding into UTF-8 encoding that can be browsed$pattern='/([\w]+) | (\\\u ([\w]{4}))/I '; Preg_match_all ($pattern,$name,$matches);if(!Empty($matches)) {$name=''; for($j=0;$j< count ($matches[0]);$j++) {$str=$matches[0][$j];if(Strpos ($str,' \\u ') ===0) {$code= Base_convert (substr ($str,2,2), -,Ten);$code 2= Base_convert (substr ($str,4), -,Ten);$c= Chr ($code). chr ($code 2);$c= Iconv (' UCS-2 ',' UTF-8 ',$c);$name.=$c; }Else{$name.=$str; } } }return$name;}Echo' my,\u4f60\u5927\u7237\u7684 '. Unicode_decode ($unicode _name);
The above describes the use of PHP to convert Unicode into UTF-8, including aspects of the content, I hope that the PHP tutorial interested in a friend helpful.