Function Analysis of GD output Chinese Characters

Source: Internet
Author: User
Tags comparison table

Early on, I found a reference table (gb2312.txt) for converting Gbit/s to 8 Gbit/s, and used it to output Chinese Characters in GD. When the content to be output contains Spanish characters, confusion may occur. Later I found the modified Code . The comparison between the two functions is as follows.

First, this is a Unicode to UTF-8 encoding conversion function, this part of the changes have not changed before and after:
Function u2utf8 ($ C)
{
For ($ I = 0; $ I <count ($ C); $ I ++)
$ STR = "";
If ($ C <0x80 ){
$ Str. = $ C;
}
Else if ($ C <0x800 ){
$ Str. = (0xc0 | $ C> 6 );
$ Str. = (0x80 | $ C & 0x3f );
}
Else if ($ C <0x10000 ){
$ Str. = (0xe0 | $ C> 12 );
$ Str. = (0x80 | $ C> 6 & 0x3f );
$ Str. = (0x80 | $ C & 0x3f );
}
Else if ($ C <0x200000 ){
$ Str. = (0xf0 | $ C> 18 );
$ Str. = (0x80 | $ C> 12 & 0x3f );
$ Str. = (0x80 | $ C> 6 & 0x3f );
$ Str. = (0x80 | $ C & 0x3f );
}
Return $ STR;
}

Here it is completely according to the UTF-8 encoding rules, by judging the character belongs to different unicode encoding segment range, different shift and bit and operation, to convert to UTF-8 encoding. For details about this rule, refer to the instructions on http://www.utf8.org.

This is the function for converting the previous GB to UTF-8 encoding, where the above u2utf8 function is called.
Function gb2utf8 ($ GB)/* program writen by sadly www.phpx.com */
{
If (! Trim ($ GB ))
Return $ GB;
$ Filename = "gb2312.txt ";
$ TMP = file ($ filename );
$ Codetable = array ();
While (List ($ key, $ value) = each ($ TMP ))
$ Codetable [hexdec (substr ($ value,)] = substr ($ value );
$ Utf8 = "";
While ($ GB)
{
If (ord (substr ($ GB, 127)>)
{
$ This = substr ($ GB, 0, 2 );
$ GB = substr ($ GB, 2, strlen ($ GB ));
$ Utf8. = u2utf8 (hexdec ($ codetable [hexdec (bin2hex ($ this)-0x8080]);
}
Else
{
$ GB = substr ($ GB, 1, strlen ($ GB ));
$ Utf8. = u2utf8 (substr ($ GB, 0, 1 ));
}
}

$ Ret = "";
For ($ I = 0; $ I <strlen ($ utf8); $ I + = 3)
$ Ret. = CHR (substr ($ utf8, $ I, 3 ));

Return $ ret;
}
In the while loop part of the function, convert Chinese characters to Unicode one by one according to the "comparison table", and then convert to UTF-8 through the u2utf8 function. But it can be seen that after the while loop is over, another for loop, every three bytes into a UTF-8 character (see http://www.utf8.org/on the regular instructions, each 8 bytes of the Chinese character is three bytes ), the Spanish character is not taken into account (the UTF-8 of the Spanish character is encoded as a byte ). Therefore, if the content to be output, whether at the beginning of the occurrence of Spanish characters, or Chinese characters interspersed with Spanish characters, after conversion to UTF-8, it will be intercepted by "every three bytes", leading to garbled characters.

The modified functions are as follows:
Function gb2utf8 ($ GB)/* program writen by sadly modified by agun */
{
If (! Trim ($ GB ))
Return $ GB;
$ Filename = "gb2312.txt ";
$ TMP = file ($ filename );
$ Codetable = array ();
While (List ($ key, $ value) = each ($ TMP ))
$ Codetable [hexdec (substr ($ value,)] = substr ($ value );

$ Ret = "";
$ Utf8 = "";
While ($ GB)
{
If (ord (substr ($ GB, 127)>)
{
$ This = substr ($ GB, 0, 2 );
$ GB = substr ($ GB, 2, strlen ($ GB ));
$ Utf8 = u2utf8 (hexdec ($ codetable [hexdec (bin2hex ($ this)-0x8080]);
For ($ I = 0; $ I <strlen ($ utf8); $ I + = 3)
$ Ret. = CHR (substr ($ utf8, $ I, 3 ));
}
Else
{
$ Ret. = substr ($ GB, 0, 1 );
$ GB = substr ($ GB, 1, strlen ($ GB ));
}
}
Return $ ret;
}

The modified function converts GB to Unicode, Unicode to UTF-8, several bytes to synthesize a UTF-8 character, and these three steps are done in a loop, in particular, several bytes to synthesize a UTF-8 character, in the judgment of the character belongs to the west or belongs to the Chinese character of the condition branch, thus determining whether to intercept a byte or three bytes. The result is correct!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.