Encoding conversion between gb2312 and unicode
The following example converts gb2312 to "full ".
The iconv function after php4.3.1 is very useful, but you only need to write a conversion function from uft8 to unicode.
Check the table (gb2312.txt ).
Copy codeThe Code is as follows:
<?
$ Text = "";
Preg_match_all ("/[\ x80-\ xff]?. /", $ Text, $ ar );
Foreach ($ ar [0] as $ v)
Echo "& #". utf8_unicode (iconv ("GB2312", "UTF-8", $ v )).";";
?>
<?
// Utf8-> unicode
Function utf8_unicode ($ c ){
Switch (strlen ($ c )){
Case 1:
Return ord ($ c );
Case 2:
$ N = (ord ($ c [0]) & 0x3f) <6;
$ N + = ord ($ c [1]) & 0x3f;
Return $ n;
Case 3:
$ N = (ord ($ c [0]) & 0x1f) <12;
$ N + = (ord ($ c [1]) & 0x3f) <6;
$ N + = ord ($ c [2]) & 0x3f;
Return $ n;
Case 4:
$ N = (ord ($ c [0]) & 0x0f) <18;
$ N + = (ord ($ c [1]) & 0x3f) <12;
$ N + = (ord ($ c [2]) & 0x3f) <6;
$ N + = ord ($ c [3]) & 0x3f;
Return $ n;
}
}
?>
The following example uses php to convert the "full" encoding to gb2312.
Copy codeThe Code is as follows:
<? Php
$ Str = "TTL auto-focus around the clock ";
$ Str = preg_replace ("| & # ([0-9] {1, 5}); | ","\". u2utf82gb (\ 1 ). \ "", $ str );
$ Str = "\ $ str = \" $ str \";";
Eval ($ str );
Echo $ str;
Function u2utf82gb ($ c ){
$ Str = "";
If ($ c <0x80 ){
$ Str. = $ c;
} Else if ($ c <0x800 ){
$ Str. = chr (0xC0 | $ c> 6 );
$ Str. = chr (0x80 | $ c & 0x3F );
} Else if ($ c <0x10000 ){
$ Str. = chr (0xE0 | $ c> 12 );
$ Str. = chr (0x80 | $ c> 6 & 0x3F );
$ Str. = chr (0x80 | $ c & 0x3F );
} Else if ($ c <0x200000 ){
$ Str. = chr (0xF0 | $ c> 18 );
$ Str. = chr (0x80 | $ c> 12 & 0x3F );
$ Str. = chr (0x80 | $ c> 6 & 0x3F );
$ Str. = chr (0x80 | $ c & 0x3F );
}
Return iconv ('utf-8', 'gb2312 ', $ str );
}
?>
Or
Copy codeThe Code is as follows:
Function unescape ($ str ){
$ Str = rawurldecode ($ str );
Preg_match_all ("/(? : % U. {4}) | & # x. {4}; | & # \ d +; |. +/U ", $ str, $ r );
$ Ar = $ r [0];
Print_r ($ ar );
Foreach ($ ar as $ k => $ v ){
If (substr ($ v, 0, 2) = "% u ")
$ Ar [$ k] = iconv ("UCS-2", "GB2312", pack ("H4", substr ($ v,-4 )));
Elseif (substr ($ v, 0, 3) = "& # x ")
$ Ar [$ k] = iconv ("UCS-2", "GB2312", pack ("H4", substr ($ v, 3,-1 )));
Elseif (substr ($ v, 0, 2) = "&#"){
Echo substr ($ v, 2,-1). "<br> ";
$ Ar [$ k] = iconv ("UCS-2", "GB2312", pack ("n", substr ($ v, 2,-1 )));
}
}
Return join ("", $ ar );
}
$ Str = "TTL auto-focus around the clock ";
Echo unescape ($ str); // out TTL automatically focus around the clock
Use javascript For Conversion
Copy codeThe Code is as follows:
<Style>
BODY {
FONT-SIZE: 9pt; PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; PADDING-TOP: 0px;
}
Input {
FONT-SIZE: 9pt; height: 13pt;
}
</Style>
<Script language = "JavaScript1.2">
/*
This following code are designed and writen by Windy_sk <seasonx@163.net>
You can use it freely, but u must held all the copyright items!
*/
Function Str2Unicode (str ){
Var arr = new Array ();
For (var I = 0; I <str. length; I ++ ){
Arr [I] = "& #" + str. charCodeAt (I) + ";";
}
Return (arr. toString (). replace (/,/g ,""));
}
Function Unicode2oStr (str ){
Var re =/& # [\ da-fA-F] {1, 5};/ig;
Var arr = str. match (re );
If (arr = null) return ("");
For (var I = 0; I <arr. length; I ++ ){
Arr [I] = String. fromCharCode (arr [I]. replace (/[& #;]/g ,""));
}
Return (arr. toString (). replace (/,/g ,""))
}
Function modi_str (){
If (document. all. text. method. checked ){
If (document. all. text. decode. value! = ""){
Document. all. text. encode. value = Str2Unicode (document. all. text. decode. value );
} Else {
Document. all. text. decode. value = Unicode2oStr (document. all. text. encode. value );
}
} Else {
If (document. all. text. encode. value! = ""){
Document. all. text. decode. value = Unicode2oStr (document. all. text. encode. value );
} Else {
Document. all. text. encode. value = Str2Unicode (document. all. text. decode. value );
}
}
}
</Script>
<Title> Unicode </title>
<Form name = text>
Text prototype: <br>
<Textarea name = "decode" cols = "100" rows = "10"> </textarea>
<Br>
Conversion code: <br>
<Textarea name = "encode" cols = "100" rows = "10"> </textarea>
<Br>
<Input type = "checkbox" name = "method" checked> forward Conversion
<Input type = button onclick = "modi_str ()" value = "OK">
<Input type = reset value = "empty">
<Input type = button onclick = "document. all. text. method. checked? Document. all. text. encode. select (): document. all. text. decode. select () "value =" select all ">
</Form>
The following is an example of displaying all the full-width and half-width fonts.
Copy codeThe Code is as follows:
<Style>
BODY {
FONT-SIZE: 9pt; PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; PADDING-TOP: 0px;
}
Input {
FONT-SIZE: 9pt; height: 13pt;
}
</Style>
<Script>
Function showUni (min, max ){
Show.doc ument. open ();
Show.doc ument. writeln ("<style> body {font-size: 9pt; word-break: break-all ;}</style> ");
Show.doc ument. writeln (min + "-" + max + "<br> ");
Var I = 0;
For (I = min; I <= max; I ++ ){
Show.doc ument. write ("& #" + I + ";");
}
Show.doc ument. close ();
}
</Script>
<Input type = button value = "halfwidth" onclick = showUni (32,126)>
<Input type = button value = "fullwidth" onclick = showUni (65281,65374)>
<Input type = button value = "Chinese 1" onclick = showUni (19968,40869)>
<Input type = button value = "Chinese 2" onclick = showUni (63744,64045)>
<Input type = button value = "文" onclick = showUni (12353,12435)>
<Input type = button value = "Japanese" onclick = showUni (12449,12534)>
<Input type = button value = "Korean" onclick = showUni (numbers 32, 55203)>
<Br> Custom: <input name = min>-<input name = max>
<Input type = button value = "" onclick = showUni (parseInt (document. all. min. value), parseInt (document. all. max. value)>
<Br>
<Iframe src = "about: blank" id = show width = 100% height = 70% scroll = no> </iframe>
The following is an example of converting gb2312 to utf8 using a lookup table (gb2312). Now there is an iconv function, which doesn't make much sense anymore,
Copy codeThe Code is as follows:
<?
Function gb2utf8 ($ gb ){
If (! Trim ($ gb) return $ gb;
$ Filename = "gb2312.txt ";
$ Tmp = file ($ filename );
$ Codetable = array ();
While (list ($ key, $ value) = each ($ tmp ))
$ Codetable [hexdec (substr ($ value,)] = substr ($ value );
$ Utf8 = "";
While ($ gb ){
If (ord (substr ($ gb, 127)> ){
$ This = substr ($ gb, 0, 2 );
$ Gb = substr ($ gb, 2, strlen ($ gb)-2 );
$ Utf8. = u2utf8 (hexdec ($ codetable [hexdec (bin2hex ($ this)-0x8080]);
} Else {
$ This = substr ($ gb, 0, 1 );
$ Gb = substr ($ gb, 1, strlen ($ gb)-1 );
$ Utf8. = u2utf8 ($ this );
}
}
Return $ utf8;
}
Function u2utf8 ($ c ){
$ Str = "";
If ($ c <0x80 ){
$ Str. = $ c;
} Else if ($ c <0x800 ){
$ Str. = chr (0xC0 | $ c> 6 );
$ Str. = chr (0x80 | $ c & 0x3F );
} Else if ($ c <0x10000 ){
$ Str. = chr (0xE0 | $ c> 12 );
$ Str. = chr (0x80 | $ c> 6 & 0x3F );
$ Str. = chr (0x80 | $ c & 0x3F );
} Else if ($ c <0x200000 ){
$ Str. = chr (0xF0 | $ c> 18 );
$ Str. = chr (0x80 | $ c> 12 & 0x3F );
$ Str. = chr (0x80 | $ c> 6 & 0x3F );
$ Str. = chr (0x80 | $ c & 0x3F );
}
Return $ str;
}
?>