Topic: Converts a UTF-8 encoded string into a GB2312 encoding, with no corresponding encoded string converted to A & #DEC; 's format. such as 회=>회
Language: Php,javascript
Content: Browsers encode strings (including non-GB2312 characters) in JavaScript encodeURI functions, get requests to servers, page encodings are GB2312, and server PHP scripts convert request data into GB2312 representations.
Basis:
1. Use the Iconv function alone to convert only GB2312 characters, and foreign characters cannot be converted
2. No ready-made functions can be used
3. Bindec () function: Converts a "01" string in binary format to a decimal number
4. Decbin () function: Converts a decimal number to a binary string, such as Decbin (224) = "11100000"
Train of thought: Because the UTF-8 has 1,2,3 byte code respectively, the Chinese and Japanese Korean text are 3 byte encodings, the processing time according to the character encoding first byte size distinguishes the byte quantity.
1. If the first byte is less than 128, the ASCII code
2.128~192, UTF-8 encoded, and processed as & #ord ();
3.192~224, two-byte UTF-8 encoding
4.224~240, three-byte code
5.240~248, four-byte code
6...
7. For a three-byte encoding attempt to convert to GB2312 with Iconv
8. Non-GB2312 multibyte characters, try to convert UTF-8 to Unicode, and then to Unicode decimal value
9. You may consider using bit operations, or you can use the Bindec () function
Program:
function getgb2312string ($name)
{
$tostr = "";
for ($i =0; $i <strlen ($name); $i + +)
{
$curbin = Ord (substr ($name, $i, 1));
if ($curbin < 0x80)
{
$tostr. = substr ($name, $i, 1);
}elseif ($curbin < Bindec ("11000000")) {
$str = substr ($name, $i, 1);
$tostr. = "&#". Ord ($STR). ";";
}elseif ($curbin < Bindec ("11100000")) {
$str = substr ($name, $i, 2);
$tostr. = "&#". Getunicodechar ($STR). ";";
$i + 1;
}elseif ($curbin < Bindec ("11110000")) {
$str = substr ($name, $i, 3);
$gstr = Iconv ("UTF-8", "GB2312", $str);
if (! $gstr)
{
$tostr. = "&#". Getunicodechar ($STR). ";";
}else{
$tostr. = $gstr;
}
$i + 2;
}elseif ($curbin < Bindec ("11111000")) {
$str = substr ($name, $i, 4);
$tostr. = "&#". Getunicodechar ($STR). ";";
$i + 3;
}elseif ($curbin < Bindec ("11111100")) {
$str = substr ($name, $i, 5);
$tostr. = "&#". Getunicodechar ($STR). ";";
$i + 4;
}else{
$str = substr ($name, $i, 6);
$tostr. = "&#". Getunicodechar ($STR). ";";
$i + 5;
}
}
return $tostr;
}
function Getunicodechar ($STR)
{
$temp = "";
for ($i =0; $i <strlen ($STR); $i + +)
{
$x = Decbin (Ord (substr ($str, $i, 1));
if ($i = = 0)
{
$s = strlen ($str) +1;
$temp. = substr ($x, $s, 8-$s);
}else{
$temp. = substr ($x, 2,6);
}
}
Return Bindec ($temp);
}
Report:
U-00000000-u-0000007f:0xxxxxxx
U-00000080-u-000007ff:110xxxxx 10xxxxxx
U-00000800-u-0000ffff:1110xxxx 10xxxxxx 10xxxxxx
U-00010000-u-001fffff:11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
U-00200000-U-03FFFFFF:111110XX 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
u-04000000-u-7fffffff:1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx