The following example is the conversion of gb2312 to uft-8 this form
php4.3.1 later Iconv function is very useful, just need to write a uft8 to Unicode conversion function
Tabular (gb2312.txt) is fine.
?
$text = "electronic stacks";
Preg_match_all ("/[\x80-\xff]?" /", $text, $ar);
foreach ($ar [0] as $v)
echo "&#". Utf8_unicode (Iconv ("GB2312", "UTF-8", $v)). ";
?>
?
UTF8-> Unicode
function Utf8_unicode ($c) {
Switch (strlen ($c)) {
Case 1:
Return ord ($c);
Case 2:
$n = (ord ($c [0]) & 0x3f) << 6;
$n + + ord ($c [1]) & 0x3f;
return $n;
Case 3:
$n = (ord ($c [0]) & 0x1f) << 12;
$n + = (ord ($c [1]) & 0x3f) << 6;
$n + + ord ($c [2]) & 0x3f;
return $n;
Case 4:
$n = (ord ($c [0]) & 0x0f) << 18;
$n + = (ord ($c [1]) & 0x3f) << 12;
$n + = (ord ($c [2]) & 0x3f) << 6;
$n + + ord ($c [3]) & 0x3f;
return $n;
}
}
?>
The following example uses PHP to convert the uft-8 encoding to gb2312.
<?php
$str = "TTL all-weather automatic Focus";
$str = Preg_replace ("|&# ([0-9]{1,5}); |", "\". U2UTF82GB (\\1). \ "", $str);
$str = "\ $str =\" $str \ ";";
eval ($STR);
Echo $str;
function U2UTF82GB ($c) {
$str = "";
if ($c < 0x80) {
$str. = $c;
else if ($c < 0x800) {
$str. =CHR (0xc0 | $c >>6);
$str. =CHR (0x80 | $c & 0x3F);
else if ($c < 0x10000) {
$str. =CHR (0xe0 | $c >>12);
$str. =CHR (0x80 | $c >>6 & 0x3F);
$str. =CHR (0x80 | $c & 0x3F);
else if ($c < 0x200000) {
$str. =CHR (0xF0 | $c >>18);
$str. =CHR (0x80 | $c >>12 & 0x3F);
$str. =CHR (0x80 | $c >>6 & 0x3F);
$str. =CHR (0x80 | $c & 0x3F);
}
Return Iconv (' UTF-8 ', ' GB2312 ', $str);
}
?>
Or is
function Unescape ($STR) {
$str = Rawurldecode ($STR);
Preg_match_all ("/(?:%u.{4}) |& #x. {4};|&#\d+;|.+/u", $str, $r);
$ar = $r [0];
Print_r ($ar);
foreach ($ar as $k => $v) {
if (substr ($v, 0,2) = = "%u")
$ar [$k] = Iconv ("UCS-2", "GB2312", Pack ("H4", substr ($v,-4));
ElseIf (substr ($v, 0,3) = = "& #x")
$ar [$k] = Iconv ("UCS-2", "GB2312", Pack ("H4", substr ($v, 3,-1));
ElseIf (substr ($v, 0,2) = = "&#") {
Echo substr ($v, 2,-1). " <br> ";
$ar [$k] = Iconv ("UCS-2", "GB2312", Pack ("n", substr ($v, 2,-1));
}
}
return join ("", $ar);
}
$str = "TTL all-weather automatic Focus";
echo unescape ($STR); File://out TTL auto Focus
Use JavaScript to convert
<style>
Body {
font-size:9pt; padding-right:0px; padding-left:0px; padding-bottom:0px; padding-top:0px;
}
Input {
font-size:9pt; height:13pt;
}
</style>
<script language= "JavaScript1.2" >
/*
This following code are designed and writen by Windy_sk <seasonx@163.net>
can use it freely, but u must held all the copyright items!
*/
function Str2unicode (str) {
var arr = new Array ();
for (Var i=0;i<str.length;i++) {
Arr[i]= "&#" + str.charcodeat (i) + ";";
}
Return (Arr.tostring (). Replace (/,/g, ""));
}
function Unicode2ostr (str) {
var re=/&#[\da-fa-f]{1,5};/ig;
var arr=str.match (re);
if (arr==null) return ("");
for (Var i=0;i<arr.length;i++) {
Arr[i]=string.fromcharcode (Arr[i].replace (/[&#;] /g, ""));
}
Return (Arr.tostring (). Replace (/,/g, "")
}
function Modi_str () {
if (document.all.text.method.checked) {
if (document.all.text.decode.value!= "") {
Document.all.text.encode.value = Str2unicode (Document.all.text.decode.value);
}else{
Document.all.text.decode.value = Unicode2ostr (Document.all.text.encode.value);
}
}else{
if (document.all.text.encode.value!= "") {
Document.all.text.decode.value = Unicode2ostr (Document.all.text.encode.value);
}else{
Document.all.text.encode.value = Str2unicode (Document.all.text.decode.value);
}
}
}
</script>
<title>Unicode</title>
<form name=text>
Text prototype:<br>
<textarea name= "decode" cols= "rows=" ></textarea>
<br>
Convert Code:<br>
<textarea name= "encode" cols= "rows=" ></textarea>
<br>
<input type= "checkbox" Name= "method" checked> forward conversion
<input Type=button value= "OK" >
<input type=reset value= "emptying" >
<input Type=button value= "Select All" >
</form>
Here is an example of a view that shows all the full-width half-width fonts
<style>
Body {
font-size:9pt; padding-right:0px; padding-left:0px; padding-bottom:0px; padding-top:0px;
}
Input {
font-size:9pt; height:13pt;
}
</style>
<script>
function Showuni (Min,max) {
Show.document.open ();
Show.document.writeln ("<style>body{font-size:9pt;word-break:break-all;}" </style> ");
Show.document.writeln (min + "-" + Max + "<br><br>");
var i=0;
for (i=min;i<=max;i++) {
Show.document.write ("&#" + i + ";");
}
Show.document.close ();
}
</script>
<input Type=button value= "Half-angle" Onclick=showuni (32,126) >
<input Type=button value= "All Corners" Onclick=showuni (65281,65374) >
<input Type=button value= "Chinese 1" onclick=showuni (19968,40869) >
<input Type=button value= "Chinese 2" Onclick=showuni (63744,64045) >
<input Type=button value= "Japanese flat" Onclick=showuni (12353,12435) >
<input Type=button value= "Japanese film" Onclick=showuni (12449,12534) >
<input Type=button value= "Han Wen" Onclick=showuni (44032,55203) >
<br> customization: <input name=min>-<input name=max>
<input Type=button value= "View" Onclick=showuni (parseint (Document.all.min.value), parseint (document.all.max.value )) >
<br>
<iframe src= "About:blank" Id=show width=100% height=70% scroll=no></iframe>
Here is an example of a look-up table (gb2312), a conversion gb2312 to a utf8, and now there is the ICONV function, which has not much meaning anymore,
?
function Gb2utf8 ($GB) {
if (!trim ($GB)) return $GB;
$filename = "Gb2312.txt";
$tmp =file ($filename);
$codetable =array ();
while (list ($key, $value) =each ($tmp))
$codetable [Hexdec (substr ($value, 0,6))]=substr ($value, 7,6);
$utf 8 = "";
while ($GB) {
if (Ord (substr ($GB, 0,1)) >127) {
$this =substr ($GB, 0,2);
$GB =substr ($GB, 2,strlen ($GB)-2);
$utf 8.=u2utf8 (Hexdec ($codetable [Hexdec (Bin2Hex ($this)) -0x8080]);
}else{
$this =substr ($GB, 0, 1);
$GB =substr ($GB, 1,strlen ($GB)-1);
$utf 8.=u2utf8 ($this);
}
}
return $UTF 8;
}
function U2utf8 ($c) {
$str = "";
if ($c < 0x80) {
$str. = $c;
else if ($c < 0x800) {
$str. =CHR (0xc0 | $c >>6);
$str. =CHR (0x80 | $c & 0x3F);
else if ($c < 0x10000) {
$str. =CHR (0xe0 | $c >>12);
$str. =CHR (0x80 | $c >>6 & 0x3F);
$str. =CHR (0x80 | $c & 0x3F);
else if ($c < 0x200000) {
$str. =CHR (0xF0 | $c >>18);
$str. =CHR (0x80 | $c >>12 & 0x3F);
$str. =CHR (0x80 | $c >>6 & 0x3F);
$str. =CHR (0x80 | $c & 0x3F);
}
return $str;
}
?>