The gb2312 and UTF-8 interchange functions of the iconv library are not required. A gb2312.txt (184799 bytes) is indeed too large and must be converted to unicode. This table is 51965 bytes, which is much smaller. Gb2312.txt (184799 bytes), which cannot be used by the iconv function library, is indeed too large and must be converted to unicode.
This table is 51965 bytes, which is much smaller.
It is very practical for scenarios where the iconv function library cannot be used.
// Usage of the comparison table
$ Filename = "gb2utf8.txt ";
$ Fp = fopen ($ filename, "r ");
While (! Feof ($ fp )){
List ($ gb, $ utf8) = fgetcsv ($ fp, 10 );
$ Charset [$ gb] = $ utf8;
}
Fclose ($ fp );
// Read the table above to the array for backup
/** Gb2312 to UTF-8 **/
Function gb2utf8 ($ text, & $ charset ){
// Extract the components in the text. a Chinese character is an element, and a continuous non-Chinese character is an element.
Preg_match_all ("/(? : [\ X80-\ xff].) | [\ x01-\ x7f] +/", $ text, $ tmp );
$ Tmp = $ tmp [0];
// Separate Chinese characters
$ Ar = array_intersect ($ tmp, array_keys ($ charset ));
// Replace the Chinese character encoding
Foreach ($ ar as $ k => $ v)
$ Tmp [$ k] = $ charset [$ v];
// Return the encoded string
Return join ('', $ tmp );
}
/** UTF-8 to gb2312 **/
Function utf82gb ($ text, & $ charset ){
$ P = "/[xf0-xf7] [x80-xbf] {3} | [xe0-xef] [x80-xbf] {2} | [xc2-xdf] [x80-xbf] | [x01-x7f] + /";
Preg_match_all ($ p, $ text, $ r );
$ Utf8 = array_flip ($ charset );
Foreach ($ r [0] as $ k => $ v)
If (isset ($ utf8 [$ v])
$ R [0] [$ k] = $ utf8 [$ v];
Return join ('', $ r [0]);
}
// Test
$ S = gb2utf8 ('this is a test of the comparison table', $ charset );
Echo utf82gb ($ s, $ charset );
?>
Bytes (184799 bytes) is indeed too large and must be converted to unicode. This table is 51965 bytes, which is much smaller. For scenarios where the iconv function library cannot be used...