Implementing encoding conversion between GB2312 and Unicode (UTF-8) in PHP

Source: Internet
Author: User
Tags 0xc0 foreach array ord pack strlen tostring utf 8

The following example is the conversion of gb2312 to uft-8 this form
php4.3.1 later Iconv function is very useful, just need to write a uft8 to Unicode conversion function
Tabular (gb2312.txt) is fine.

?
$text = "electronic stacks";
Preg_match_all ("/[\x80-\xff]?" /", $text, $ar);
foreach ($ar [0] as $v)
echo "&#". Utf8_unicode (Iconv ("GB2312", "UTF-8", $v)). ";
?>
?
UTF8-> Unicode
function Utf8_unicode ($c) {
Switch (strlen ($c)) {
Case 1:
Return ord ($c);
Case 2:
$n = (ord ($c [0]) & 0x3f) << 6;
$n + + ord ($c [1]) & 0x3f;
return $n;
Case 3:
$n = (ord ($c [0]) & 0x1f) << 12;
$n + = (ord ($c [1]) & 0x3f) << 6;
$n + + ord ($c [2]) & 0x3f;
return $n;
Case 4:
$n = (ord ($c [0]) & 0x0f) << 18;
$n + = (ord ($c [1]) & 0x3f) << 12;
$n + = (ord ($c [2]) & 0x3f) << 6;
$n + + ord ($c [3]) & 0x3f;
return $n;
}
}
?>

The following example uses PHP to convert the uft-8 encoding to gb2312.

<?php
$str = "TTL all-weather automatic Focus";
$str = Preg_replace ("|&# ([0-9]{1,5}); |", "\". U2UTF82GB (\\1). \ "", $str);
$str = "\ $str =\" $str \ ";";
eval ($STR);
Echo $str;
function U2UTF82GB ($c) {
$str = "";
if ($c < 0x80) {
$str. = $c;
else if ($c < 0x800) {
$str. =CHR (0xc0 | $c >>6);
$str. =CHR (0x80 | $c & 0x3F);
else if ($c < 0x10000) {
$str. =CHR (0xe0 | $c >>12);
$str. =CHR (0x80 | $c >>6 & 0x3F);
$str. =CHR (0x80 | $c & 0x3F);
else if ($c < 0x200000) {
$str. =CHR (0xF0 | $c >>18);
$str. =CHR (0x80 | $c >>12 & 0x3F);
$str. =CHR (0x80 | $c >>6 & 0x3F);
$str. =CHR (0x80 | $c & 0x3F);
}
Return Iconv (' UTF-8 ', ' GB2312 ', $str);
}
?>

Or is

function Unescape ($STR) {
$str = Rawurldecode ($STR);
Preg_match_all ("/(?:%u.{4}) |& #x. {4};|&#\d+;|.+/u", $str, $r);
$ar = $r [0];
Print_r ($ar);
foreach ($ar as $k => $v) {
if (substr ($v, 0,2) = = "%u")
$ar [$k] = Iconv ("UCS-2", "GB2312", Pack ("H4", substr ($v,-4));
ElseIf (substr ($v, 0,3) = = "& #x")
$ar [$k] = Iconv ("UCS-2", "GB2312", Pack ("H4", substr ($v, 3,-1));
ElseIf (substr ($v, 0,2) = = "&#") {
Echo substr ($v, 2,-1). " <br> ";
$ar [$k] = Iconv ("UCS-2", "GB2312", Pack ("n", substr ($v, 2,-1));
}
}
return join ("", $ar);
}
$str = "TTL all-weather automatic Focus";
echo unescape ($STR); File://out TTL auto Focus


Use JavaScript to convert

<style>
Body {
font-size:9pt; padding-right:0px; padding-left:0px; padding-bottom:0px; padding-top:0px;
}
Input {
font-size:9pt; height:13pt;
}
</style>
<script language= "JavaScript1.2" >
/*
This following code are designed and writen by Windy_sk <seasonx@163.net>
can use it freely, but u must held all the copyright items!
*/
function Str2unicode (str) {
var arr = new Array ();
for (Var i=0;i<str.length;i++) {
Arr[i]= "&#" + str.charcodeat (i) + ";";
}
Return (Arr.tostring (). Replace (/,/g, ""));
}
function Unicode2ostr (str) {
var re=/&#[\da-fa-f]{1,5};/ig;
var arr=str.match (re);
if (arr==null) return ("");
for (Var i=0;i<arr.length;i++) {
Arr[i]=string.fromcharcode (Arr[i].replace (/[&#;] /g, ""));
}
Return (Arr.tostring (). Replace (/,/g, "")
}
function Modi_str () {
if (document.all.text.method.checked) {
if (document.all.text.decode.value!= "") {
Document.all.text.encode.value = Str2unicode (Document.all.text.decode.value);
}else{
Document.all.text.decode.value = Unicode2ostr (Document.all.text.encode.value);
}
}else{
if (document.all.text.encode.value!= "") {
Document.all.text.decode.value = Unicode2ostr (Document.all.text.encode.value);
}else{
Document.all.text.encode.value = Str2unicode (Document.all.text.decode.value);
}
}
}
</script>
<title>Unicode</title>
<form name=text>
Text prototype:<br>
<textarea name= "decode" cols= "rows=" ></textarea>
<br>
Convert Code:<br>
<textarea name= "encode" cols= "rows=" ></textarea>
<br>
<input type= "checkbox" Name= "method" checked> forward conversion
<input Type=button value= "OK" >
<input type=reset value= "emptying" >
<input Type=button value= "Select All" >
</form>

Here is an example of a view that shows all the full-width half-width fonts

<style>
Body {
font-size:9pt; padding-right:0px; padding-left:0px; padding-bottom:0px; padding-top:0px;
}
Input {
font-size:9pt; height:13pt;
}
</style>
<script>
function Showuni (Min,max) {
Show.document.open ();
Show.document.writeln ("<style>body{font-size:9pt;word-break:break-all;}" </style> ");
Show.document.writeln (min + "-" + Max + "<br><br>");
var i=0;
for (i=min;i<=max;i++) {
Show.document.write ("&#" + i + ";");
}
Show.document.close ();
}
</script>
<input Type=button value= "Half-angle" Onclick=showuni (32,126) >
<input Type=button value= "All Corners" Onclick=showuni (65281,65374) >
<input Type=button value= "Chinese 1" onclick=showuni (19968,40869) >
<input Type=button value= "Chinese 2" Onclick=showuni (63744,64045) >
<input Type=button value= "Japanese flat" Onclick=showuni (12353,12435) >
<input Type=button value= "Japanese film" Onclick=showuni (12449,12534) >
<input Type=button value= "Han Wen" Onclick=showuni (44032,55203) >
<br> customization: <input name=min>-<input name=max>
<input Type=button value= "View" Onclick=showuni (parseint (Document.all.min.value), parseint (document.all.max.value )) >
<br>
<iframe src= "About:blank" Id=show width=100% height=70% scroll=no></iframe>

Here is an example of a look-up table (gb2312), a conversion gb2312 to a utf8, and now there is the ICONV function, which has not much meaning anymore,

?
function Gb2utf8 ($GB) {
if (!trim ($GB)) return $GB;
$filename = "Gb2312.txt";
$tmp =file ($filename);
$codetable =array ();
while (list ($key, $value) =each ($tmp))
$codetable [Hexdec (substr ($value, 0,6))]=substr ($value, 7,6);
$utf 8 = "";
while ($GB) {
if (Ord (substr ($GB, 0,1)) >127) {
$this =substr ($GB, 0,2);
$GB =substr ($GB, 2,strlen ($GB)-2);
$utf 8.=u2utf8 (Hexdec ($codetable [Hexdec (Bin2Hex ($this)) -0x8080]);
}else{
$this =substr ($GB, 0, 1);
$GB =substr ($GB, 1,strlen ($GB)-1);
$utf 8.=u2utf8 ($this);
}
}
return $UTF 8;
}
function U2utf8 ($c) {
$str = "";
if ($c < 0x80) {
$str. = $c;
else if ($c < 0x800) {
$str. =CHR (0xc0 | $c >>6);
$str. =CHR (0x80 | $c & 0x3F);
else if ($c < 0x10000) {
$str. =CHR (0xe0 | $c >>12);
$str. =CHR (0x80 | $c >>6 & 0x3F);
$str. =CHR (0x80 | $c & 0x3F);
else if ($c < 0x200000) {
$str. =CHR (0xF0 | $c >>18);
$str. =CHR (0x80 | $c >>12 & 0x3F);
$str. =CHR (0x80 | $c >>6 & 0x3F);
$str. =CHR (0x80 | $c & 0x3F);
}
return $str;
}
?>



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.