A gb2312.txt (184799 bytes) is indeed too large and must be converted to unicode.
This table is 51965 bytes, which is much smaller.
It is very practical for scenarios where the iconv function library cannot be used.
<? Php
// Usage of the comparison table
$ Filename = "gb2utf8.txt ";
$ Fp = fopen ($ filename, "r ");
While (! Feof ($ fp )){
List ($ gb, $ utf8) = fgetcsv ($ fp, 10 );
$ Charset [$ gb] = $ utf8;
}
Fclose ($ fp );
// Read the table above to the array for backup
/** Gb2312 to UTF-8 **/
Function gb2utf8 ($ text, & $ charset ){
// Extract the components in the text. A Chinese character is an element, and a continuous non-Chinese character is an element.
Preg_match_all ("/(? : [X80-xff].) | [x01-x7f] +/", $ text, $ tmp );
$ Tmp = $ tmp [0];
// Separate Chinese characters
$ Ar = array_intersect ($ tmp, array_keys ($ charset ));
// Replace the Chinese character encoding
Foreach ($ ar as $ k => $ v)
$ Tmp [$ k] = $ charset [$ v];
// Return the encoded string
Return join (', $ tmp );
}
/** UTF-8 to gb2312 **/
Function utf82gb ($ text, & $ charset ){
$ P = "/[xf0-xf7] [x80-xbf] {3} | [xe0-xef] [x80-xbf] {2} | [xc2-xdf] [x80-xbf] | [x01-x7f] + /";
Preg_match_all ($ p, $ text, $ r );
$ Utf8 = array_flip ($ charset );
Foreach ($ r [0] as $ k => $ v)
If (isset ($ utf8 [$ v])
$ R [0] [$ k] = $ utf8 [$ v];
Return join (', $ r [0]);
}
// Test
$ S = gb2utf8 ('this is a test of the comparison table', $ charset );
Echo utf82gb ($ s, $ charset );
?>
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.