Using the iconv function to implement the PHP function of gb2312 for UTF-8 encoding conversion

Source: Internet
Author: User

Using the iconv function to implement the PHP function of gb2312 for UTF-8 encoding conversion

 

If I use the iconv () function to convert the code, it is relatively simple, but many virtual hosts do not support this component, I am online

It takes half a day to find a method for converting gb2312 to UTF-8, but cannot reverse conversion.

This function is as follows:

/*******************************
// GB to UTF-8 Encoding
*******************************/
Function gb2utf8 ($ gbstr ){
Global $ codetable;
If (TRIM ($ gbstr) = "") return $ gbstr;
If (empty ($ codetable )){
$ Filename = dirname (_ file _). "/gb2312-utf8.table ";
$ Fp = fopen ($ filename, "R ");
While ($ L = fgets ($ FP, 15 ))
{$ Codetable [hexdec (substr ($ L, 0, 6)] = substr ($ L, 7, 6 );}
Fclose ($ FP );
}
$ Ret = "";
$ Utf8 = "";
While ($ gbstr ){
If (ord (substr ($ gbstr, 0, 1) & gt; 127 ){
$ Thisw = substr ($ gbstr, 0, 2 );
$ Gbstr = substr ($ gbstr, 2, strlen ($ gbstr ));
$ Utf8 = "";
@ $ Utf8 = u2utf8 (hexdec ($ codetable [hexdec (bin2hex ($ thisw)-0x8080]);
If ($ utf8! = ""){
For ($ I = 0; $ I <strlen ($ utf8); $ I + = 3)
$ Ret. = CHR (substr ($ utf8, $ I, 3 ));
}
}
Else
{
$ Ret. = substr ($ gbstr, 0, 1 );
$ Gbstr = substr ($ gbstr, 1, strlen ($ gbstr ));
}
}
Return $ ret;
}
// Unicode to utf8
Function u2utf8 ($ c ){
For ($ I = 0; $ I <count ($ C); $ I ++)
$ STR = "";
If ($ C <0x80 ){
$ Str. = $ C;
} Else if ($ C <0x800 ){
$ Str. = (0xc0 | $ C> 6 );
$ Str. = (0x80 | $ C & 0x3f );
} Else if ($ C <0x10000 ){
$ Str. = (0xe0 | $ C> 12 );
$ Str. = (0x80 | $ C> 6 & 0x3f );
$ Str. = (0x80 | $ C & 0x3f );
} Else if ($ C <0x200000 ){
$ Str. = (0xf0 | $ C> 18 );
$ Str. = (0x80 | $ C> 12 & 0x3f );
$ Str. = (0x80 | $ C> 6 & 0x3f );
$ Str. = (0x80 | $ C & 0x3f );
}
Return $ STR;
}

Because gb2312 is dual-byte, it is relatively simple to convert to UTF-8, but it is very troublesome to convert it to UTF-8. I tried it:

This way

Function utf82gb ($ utfstr)
{
Global $ uc2gbtable;
$ Okstr = "";
If (TRIM ($ utfstr) = "") return $ utfstr;
If (empty ($ uc2gbtable )){
$ Filename = dirname (_ file _). "/gb2312-utf8.table ";
$ Fp = fopen ($ filename, "R ");
While ($ L = fgets ($ FP, 15 ))
{$ Uc2gbtable [hexdec (substr ($ L, 7, 6)] = hexdec (substr ($ L, 0, 6 ));}
Fclose ($ FP );
}
$ Ulen = strlen ($ utfstr );
For ($ I = 0; $ I <$ Ulen; $ I ++)
{
If (ord ($ utfstr [$ I]) <0x81) $ okstr. = $ utfstr [$ I];
Else
{
If ($ Ulen> $ I + 2)
{
$ Utfc = substr ($ utfstr, $ I, 3 );
$ C = "";
@ $ C = dechex ($ uc2gbtable [utf82u_3 ($ utfc)] + 0x8080 );
If ($ C! = ""){
$ Okstr. = CHR (hexdec ($ C [0]. $ C [1]). CHR (hexdec ($ C [2]. $ C [3]);
}
}
Else
{$ Okstr. = $ utfstr [$ I];}
}
}
$ Okstr = trim ($ okstr );
Return $ okstr;
}

Function utf82u_3 ($ C)
{
$ N = (ord ($ C [0]) & 0x1f) <12;
$ N + = (ord ($ C [1]) & 0x3f) <6;
$ N + = ord ($ C [2]) & 0x3f;
Return $ N;
}

In this way, most characters can be converted successfully, but it is always a bit inappropriate. I changed the program to this:

Function utf82gb ($ utfstr)
{
Global $ uc2gbtable;
$ Okstr = "";
If (TRIM ($ utfstr) = "") return $ utfstr;
If (empty ($ uc2gbtable )){
$ Filename = dirname (_ file _). "/gb2312-utf8.table ";
$ Fp = fopen ($ filename, "R ");
While ($ L = fgets ($ FP, 15 ))
{$ Uc2gbtable [hexdec (substr ($ L, 7, 6)] = hexdec (substr ($ L, 0, 6 ));}
Fclose ($ FP );
}
$ Okstr = "";
$ Utfstr = urlencode ($ utfstr );
$ Ulen = strlen ($ utfstr );
For ($ I = 0; $ I <$ Ulen; $ I ++)
{
If ($ utfstr [$ I] = "% ")
{
If ($ Ulen> $ I + 2 ){
$ Hexnext = hexdec ("0x". substr ($ utfstr, $ I + 1, 2 ));
If ($ hexnext <127 ){
$ Okstr. = CHR ($ hexnext );
$ I = $ I + 2;
}
Else {
If ($ Ulen >=$ I + 9 ){
$ Hexnext = substr ($ utfstr, $ I + 1, 8 );
$ C = "";
@ $ C = dechex ($ uc2gbtable [url_utf2u ($ hexnext)] + 0x8080 );
If ($ C! = ""){
$ Okstr. = CHR (hexdec ($ C [0]. $ C [1]). CHR (hexdec ($ C [2]. $ C [3]);
}
$ I = $ I + 8;
}
}
}
Else
{$ Okstr. = $ utfstr [$ I];}
}
Else if ($ utfstr [$ I] = "+ ")
$ Okstr. = "";
Else
$ Okstr. = $ utfstr [$ I];
}
$ Okstr = trim ($ okstr );
Return $ okstr;
}
// Convert the UTF-8 encoded three-byte URL to Unicode.
Function url_utf2u ($ C)
{
$ Utfc = "";
$ Cs = Split ("%", $ C );
For ($ I = 0; $ I <count ($ CS); $ I ++ ){
$ Utfc. = CHR (hexdec ("0x". $ CS [$ I]);
}
$ N = (ord ($ utfc [0]) & 0x1f) <12;
$ N + = (ord ($ utfc [1]) & 0x3f) <6;
$ N + = ord ($ utfc [2]) & 0x3f;
Return $ N;
}

I found that the test was completely OK, and the speed was faster than the previous method. I really don't understand why.

Who want to gb2312-utf8.table this file please add my QQ 2500875 it Plato or contact with 1877000 bubble

 

 

Reprint address: http://prato.bokele.com /? ArticleID = 19533

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.