Chinese Character | conversion
Autumn Waters without hate GBK Unicode UTF8 Chinese character conversion
The conversion of Chinese characters in PHP has always been a tricky business.
The class has four filters "&#[dec];", "& #x [hex];", "%u[hex]", "UTF8 conversion"
Easy to use, but also can customize the filter to do their favorite operation
qswhu.php Download from here
Http://www.blueidea.com/user/qswh/qswhU.zip
Class qswhu{
var $qswhData;
function Qswhu ($filename = "qswhu.php") {
$this->qswhdata=file ($filename);
}
function decode ($STR, $pattern =0) {
$arr =array ("/&# (\w+);/iu", "/((%\w\w) +)/I", "/%u (\w{4,5})/iu");
if (Is_integer ($pattern)) {
if ($pattern >=count ($arr)) Die ("Invalid Function");
$pattern = $arr [$pattern];
}
Return Preg_replace_callback ($pattern, Array ($this, "U2GB"), $STR);
}
function U2GB ($arr) {
/****** (Qiushuiwuhen 2002-8-15) ******/
$ret = ""; $str = $arr [1];
if (Preg_match_all ("/%\w{2}/", $str, $matches)) {
For ($i =0 $i <count ($matches [0]); $i + +) {
$CHR 1=hexdec (substr ($matches [0][$i],1));
$arr =array ("F0", "E0", "C0", "0");
For ($j =0 $j <count ($arr); $j + +) if ($chr 1>hexdec ($arr [$j])) break;
$CHR =hexdec (substr ($matches [0][$i],1))-hexdec ($arr [$j]);
while (+ + $j <count ($arr)) $CHR = $chr *0x40+ (Hexdec (substr ($matches [0][++ $i],1)) -0x80);
$str =dechex ($CHR);
if (strlen ($STR) ==4) {
$p =hexdec (substr ($str, 0,2)) -0x4d;
$q =hexdec (substr ($STR, 2)) *4;
$ret. =CHR (Hexdec (substr ($this->qswhdata[$p], $q, 2));
$ret. =CHR (Hexdec (substr ($this->qswhdata[$p), $q +2,2));
}else
$ret. =CHR (Hexdec ($STR));
}
}
else{
if (Strtolower ($str [0]) = = "X")
$str =substr ($STR, 1);
Else
if (strlen ($STR)!=4) $str =dechex ($STR);
if (strlen ($STR) ==4) {
$p =hexdec (substr ($str, 0,2)) -0x4d;
$q =hexdec (substr ($STR, 2)) *4;
$ret. =CHR (Hexdec (substr ($this->qswhdata[$p], $q, 2));
$ret. =CHR (Hexdec (substr ($this->qswhdata[$p), $q +2,2));
}else
$ret. =CHR (Hexdec ($STR));
}
return $ret;
}
}
Usage examples
$QSWH =new qswhu ("qswhu.php"); If the file name is qswhu.php, you can save the parameter
echo "<xmp> without parameters (default filter is: &#[num];):";
echo "\ n". $qswh->decode ("Chinese abc");
echo "\ n". $qswh->decode ("Chinese abc");
echo \ n invokes the built-in filter (UTF): ". $qswh->decode ("%e4%b8%ad%e6%96%87%20!%22%23%24%25%26 "() *%2b%2c%2f%3a%3b%3c%3d%3e%3f% 40%5b%5d%5e%60%7b%7c%7d~%25abc ", 1);
echo "\ n Call built-in filter unescape (%u[num]):". $qswh->decode ("%u4e2d%u6587abc", 2);
echo \ n Custom filter ([X+num]): ". $qswh->decode (" [x4e2d][x6587][x41][x62][x63] ","/\[(\w+) \]/");
The effect is as follows:
With no parameters (default filter is: &#[num];):
Chinese ABC
Chinese ABC
Call built-in filter (UTF transcoding): Chinese! " #$%& ' () *+,/:;<=>?@[]^ ' {|} ~%abc
Call built-in filter unescape (%u[num]): Chinese ABC
Custom Filter ([X+num]): Chinese ABC