Using the iconv function to implement the PHP function of gb2312 for UTF-8 encoding conversion
If I use the iconv () function to convert the code, it is relatively simple, but many virtual hosts do not support this component, I am online
It takes half a day to find a method for converting gb2312 to UTF-8, but cannot reverse conversion.
This function is as follows:
/*******************************
// GB to UTF-8 Encoding
*******************************/
Function gb2utf8 ($ gbstr ){
Global $ codetable;
If (TRIM ($ gbstr) = "") return $ gbstr;
If (empty ($ codetable )){
$ Filename = dirname (_ file _). "/gb2312-utf8.table ";
$ Fp = fopen ($ filename, "R ");
While ($ L = fgets ($ FP, 15 ))
{$ Codetable [hexdec (substr ($ L, 0, 6)] = substr ($ L, 7, 6 );}
Fclose ($ FP );
}
$ Ret = "";
$ Utf8 = "";
While ($ gbstr ){
If (ord (substr ($ gbstr, 0, 1) & gt; 127 ){
$ Thisw = substr ($ gbstr, 0, 2 );
$ Gbstr = substr ($ gbstr, 2, strlen ($ gbstr ));
$ Utf8 = "";
@ $ Utf8 = u2utf8 (hexdec ($ codetable [hexdec (bin2hex ($ thisw)-0x8080]);
If ($ utf8! = ""){
For ($ I = 0; $ I <strlen ($ utf8); $ I + = 3)
$ Ret. = CHR (substr ($ utf8, $ I, 3 ));
}
}
Else
{
$ Ret. = substr ($ gbstr, 0, 1 );
$ Gbstr = substr ($ gbstr, 1, strlen ($ gbstr ));
}
}
Return $ ret;
}
// Unicode to utf8
Function u2utf8 ($ c ){
For ($ I = 0; $ I <count ($ C); $ I ++)
$ STR = "";
If ($ C <0x80 ){
$ Str. = $ C;
} Else if ($ C <0x800 ){
$ Str. = (0xc0 | $ C> 6 );
$ Str. = (0x80 | $ C & 0x3f );
} Else if ($ C <0x10000 ){
$ Str. = (0xe0 | $ C> 12 );
$ Str. = (0x80 | $ C> 6 & 0x3f );
$ Str. = (0x80 | $ C & 0x3f );
} Else if ($ C <0x200000 ){
$ Str. = (0xf0 | $ C> 18 );
$ Str. = (0x80 | $ C> 12 & 0x3f );
$ Str. = (0x80 | $ C> 6 & 0x3f );
$ Str. = (0x80 | $ C & 0x3f );
}
Return $ STR;
}
Because gb2312 is dual-byte, it is relatively simple to convert to UTF-8, but it is very troublesome to convert it to UTF-8. I tried it:
This way
Function utf82gb ($ utfstr)
{
Global $ uc2gbtable;
$ Okstr = "";
If (TRIM ($ utfstr) = "") return $ utfstr;
If (empty ($ uc2gbtable )){
$ Filename = dirname (_ file _). "/gb2312-utf8.table ";
$ Fp = fopen ($ filename, "R ");
While ($ L = fgets ($ FP, 15 ))
{$ Uc2gbtable [hexdec (substr ($ L, 7, 6)] = hexdec (substr ($ L, 0, 6 ));}
Fclose ($ FP );
}
$ Ulen = strlen ($ utfstr );
For ($ I = 0; $ I <$ Ulen; $ I ++)
{
If (ord ($ utfstr [$ I]) <0x81) $ okstr. = $ utfstr [$ I];
Else
{
If ($ Ulen> $ I + 2)
{
$ Utfc = substr ($ utfstr, $ I, 3 );
$ C = "";
@ $ C = dechex ($ uc2gbtable [utf82u_3 ($ utfc)] + 0x8080 );
If ($ C! = ""){
$ Okstr. = CHR (hexdec ($ C [0]. $ C [1]). CHR (hexdec ($ C [2]. $ C [3]);
}
}
Else
{$ Okstr. = $ utfstr [$ I];}
}
}
$ Okstr = trim ($ okstr );
Return $ okstr;
}
Function utf82u_3 ($ C)
{
$ N = (ord ($ C [0]) & 0x1f) <12;
$ N + = (ord ($ C [1]) & 0x3f) <6;
$ N + = ord ($ C [2]) & 0x3f;
Return $ N;
}
In this way, most characters can be converted successfully, but it is always a bit inappropriate. I changed the program to this:
Function utf82gb ($ utfstr)
{
Global $ uc2gbtable;
$ Okstr = "";
If (TRIM ($ utfstr) = "") return $ utfstr;
If (empty ($ uc2gbtable )){
$ Filename = dirname (_ file _). "/gb2312-utf8.table ";
$ Fp = fopen ($ filename, "R ");
While ($ L = fgets ($ FP, 15 ))
{$ Uc2gbtable [hexdec (substr ($ L, 7, 6)] = hexdec (substr ($ L, 0, 6 ));}
Fclose ($ FP );
}
$ Okstr = "";
$ Utfstr = urlencode ($ utfstr );
$ Ulen = strlen ($ utfstr );
For ($ I = 0; $ I <$ Ulen; $ I ++)
{
If ($ utfstr [$ I] = "% ")
{
If ($ Ulen> $ I + 2 ){
$ Hexnext = hexdec ("0x". substr ($ utfstr, $ I + 1, 2 ));
If ($ hexnext <127 ){
$ Okstr. = CHR ($ hexnext );
$ I = $ I + 2;
}
Else {
If ($ Ulen >=$ I + 9 ){
$ Hexnext = substr ($ utfstr, $ I + 1, 8 );
$ C = "";
@ $ C = dechex ($ uc2gbtable [url_utf2u ($ hexnext)] + 0x8080 );
If ($ C! = ""){
$ Okstr. = CHR (hexdec ($ C [0]. $ C [1]). CHR (hexdec ($ C [2]. $ C [3]);
}
$ I = $ I + 8;
}
}
}
Else
{$ Okstr. = $ utfstr [$ I];}
}
Else if ($ utfstr [$ I] = "+ ")
$ Okstr. = "";
Else
$ Okstr. = $ utfstr [$ I];
}
$ Okstr = trim ($ okstr );
Return $ okstr;
}
// Convert the UTF-8 encoded three-byte URL to Unicode.
Function url_utf2u ($ C)
{
$ Utfc = "";
$ Cs = Split ("%", $ C );
For ($ I = 0; $ I <count ($ CS); $ I ++ ){
$ Utfc. = CHR (hexdec ("0x". $ CS [$ I]);
}
$ N = (ord ($ utfc [0]) & 0x1f) <12;
$ N + = (ord ($ utfc [1]) & 0x3f) <6;
$ N + = ord ($ utfc [2]) & 0x3f;
Return $ N;
}
I found that the test was completely OK, and the speed was faster than the previous method. I really don't understand why.
Who want to gb2312-utf8.table this file please add my QQ 2500875 it Plato or contact with 1877000 bubble
Reprint address: http://prato.bokele.com /? ArticleID = 19533