Convert Unicode to UTF-8 with PHP

Source: Internet
Author: User
: This article mainly introduces how to use PHP to convert Unicode to a UTF-8, for PHP tutorials interested in students can refer. Function unescape ($ str) {$ str = rawurldecode ($ str );

Preg_match_all ("/(? : % U. {4}) | & # x. {4}; | & # \ d +; |. +/U ", $ str, $ r); $ ar = $ r [0]; // print_r ($ ar ); foreach ($ ar as $ k = >$ v) {if (substr ($ v,) = "% u ") {$ ar [$ k] = iconv ("UCS-2BE", "UTF-8", pack ("H4", substr ($ v,-4 )));} elseif (substr ($ v,) = "& # x") {$ ar [$ k] = iconv ("UCS-2BE", "UTF-8 ", pack ("H4", substr ($ v, 3,-1);} elseif (substr ($ v,) = "&#") {$ ar [$ k] = iconv ("UCS-2BE", "UTF-8", pack ("n", substr ($ v, 2,-1 )));}} returnjoin ("", $ ar);} echo unescape ("Purple Star Blue ");

Some users reported that the data submitted by users in the form system is garbled in Chinese. The test found that the problem lies in the iconv conversion.
Iconv ('ucs-2', 'gbk', 'Chinese ')
Google search found that the reason is that the UCS-2 encoding method on the Linux server is inconsistent with Winodws.
So I changed it to iconv ('ucs-2be ', 'gbk', and 'Chinese'). the Chinese language is normal.

The following are the unspoken rules for UCS-2 coding on both platforms:

1, the UCS-2 is not equal to the UTF-16. Each byte in the UTF-16 uses ASCII character range encoding, while the UCS-2 can encode each byte beyond the ASCII character range. UCS-2 and UTF-16 take up to two bytes for each character, but their encoding is different.

2, for UCS-2, windows is the default UCS-2LE. The unicode of the UCS-2LE is generated with MultibyteToWidechar (or A2W. Windows Notepad can save text as a UCS-2BE, which is equivalent to a layer conversion.

3, for UCS-2, the default is UCS-2BE in linux. Iconv (specifies the UCS-2) is used to convert the unicode of the UCS-2BE. If you convert a UCS-2 from a windows platform, you need to specify a UCS-2LE.

4, in view of windows and linux and other platforms on the UCS-2 of different understanding (UCS-2LE, UCS-2BE ). MS advocates unicode has a bootstrap sign (UCS-2LE FFFE, UCS-2BE FEFF) to indicate that the following characters are unicode and identify big-endian or little-endian. Therefore, data from the windows platform has this prefix, so you don't need to worry.

5. linux Encoding output, such as output from a file and output from printf, requires proper encoding matching on the console (if the encoding does not match, generally, it is related to the encoding during compilation of the program), and the conversion input in the console needs to view the current system encoding. For example, the current console encoding is UTF-8, then the UTF-8 encoding can be correctly displayed, GBK can not; similarly, the current encoding is GBK, you can display GBK encoding, later, the system should be more intelligent to handle more transformations. However, through putty and other terminals, you still need to set the terminal encoding conversion to relieve garbled characters.

Implementation of UNICODE encoding and decoding for Chinese characters in PHP

// Encode the content in UNICODE: functionunicode_encode ($ name) {$ name = iconv ('utf-8', 'ucs-2', $ name ); $ len = strlen ($ name); $ str = ''; for ($ I = 0; $ I <$ len-1; $ I = $ I + 2) {$ c = $ name [$ I]; $ c2 = $ name [$ I + 1]; if (ord ($ c)> 0) {// two-byte text $ str. = '\ U '. base_convert (ord ($ c), 10, 16 ). base_convert (ord ($ c2), 10, 16);} else {$ str. = $ c2 ;}} return $ str ;}$ name = 'My, your uncle's '; $ unicode_name = unicode_encode ($ name); ec Ho ''. $ unicode_name. ''; // decode the UNICODE encoded content functionunicode_decode ($ name) {// Convert the encoding, convert Unicode encoding to UTF-8 encoding that can be viewed $ pattern = '/([\ w] +) | (\ u ([\ w] {4 })) /I '; preg_match_all ($ pattern, $ name, $ matches); if (! Empty ($ matches) {$ name = ''; for ($ j = 0; $ j <count ($ matches [0]); $ j ++) {$ str = $ matches [0] [$ j]; if (strpos ($ str, '\ u') = 0) {$ code = base_convert (substr ($ str, 2, 2), 16, 10); $ code2 = base_convert (substr ($ str, 4), 16, 10 ); $ c = chr ($ code ). chr ($ code2); $ c = iconv ('ucs-2', 'utf-8', $ c); $ name. = $ c;} else {$ name. = $ str ;}}return $ name;} echo 'My, \ u4f60 \ u5927 \ u7237 \ u7684-> '. unicode_decode ($ unicode_name );

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.