This article provides a detailed analysis of PHP's automatic character set judgment and transcoding. For more information, see
This article provides a detailed analysis of PHP's automatic character set judgment and transcoding. For more information, see
The principle is very simple. VM, because gb2312/gbk is a Chinese byte, the two bytes have a value range, while UTF-8 contains three Chinese characters, and each byte also has a value range. The English language only occupies one byte (excluding the full width), regardless of the encoding ).
For file encoding checks, you can also directly check the BOM information of UTF-8. Not to mention, the VM directly goes to the function, which is used to check and transcode strings.
The Code is as follows:
Function safeEncoding ($ string, $ outEncoding = 'utf-8 ')
{
$ Encoding = "UTF-8 ";
For ($ I = 0; $ I {
If (ord ($ string {$ I}) <128)
Continue;
If (ord ($ string {$ I}) & 224) = 224)
{
// The first byte is determined
$ Char = $ string {++ $ I };
If (ord ($ char) & 128) = 128)
{
// The second byte is passed
$ Char = $ string {++ $ I };
If (ord ($ char) & 128) = 128)
{
$ Encoding = "UTF-8 ";
Break;
}
}
}
If (ord ($ string {$ I}) & 192) = 192)
{
// The first byte is determined
$ Char = $ string {++ $ I };
If (ord ($ char) & 128) = 128)
{
// The second byte is passed
$ Encoding = "GB2312 ";
Break;
}
}
}
If (strtoupper ($ encoding) = strtoupper ($ outEncoding ))
Return $ string;
Else
Return iconv ($ encoding, $ outEncoding, $ string );
}
?>
, Hong Kong Space