PHP automatically determines the character set and transcoding details. The principle is very simple, because gb2312gbk is a Chinese byte, the two bytes have a value range, while UTF-8 contains three Chinese characters, and each byte also has a value range. However, the principle of English is simple, because gb2312/gbk is a two-byte Chinese character, which has a value range, while UTF-8 contains three Chinese characters, each byte also has a value range. The English language only occupies one byte (excluding the full width), regardless of the encoding ).
For file encoding checks, you can also directly check the BOM information of UTF-8. Let's not say much about it. the function is used to check and transcode strings.
The code is as follows:
Function safeEncoding ($ string, $ outEncoding = 'utf-8 ')
{
$ Encoding = "UTF-8 ";
For ($ I = 0; $ I {
If (ord ($ string {$ I}) <128)
Continue;
If (ord ($ string {$ I}) & 224) = 224)
{
// The first byte is determined
$ Char = $ string {++ $ I };
If (ord ($ char) & 128) = 128)
{
// The second byte is passed
$ Char = $ string {++ $ I };
If (ord ($ char) & 128) = 128)
{
$ Encoding = "UTF-8 ";
Break;
}
}
}
If (ord ($ string {$ I}) & 192) = 192)
{
// The first byte is determined
$ Char = $ string {++ $ I };
If (ord ($ char) & 128) = 128)
{
// The second byte is passed
$ Encoding = "GB2312 ";
Break;
}
}
}
If (strtoupper ($ encoding) = strtoupper ($ outEncoding ))
Return $ string;
Else
Return iconv ($ encoding, $ outEncoding, $ string );
}
?>
Bytes. The English language is not in...