The principle is very simple, because the GB2312/GBK is Chinese two bytes, these two bytes is a range of values, and utf-8 Chinese characters are three bytes, also have a range of values for each byte. The English language is less than 128, regardless of the encoding, and occupies only one byte (except all corners).
If it is a file-form code check, you can also check Utf-8 's BOM information directly. To say, directly on the function, this function is used to check the string and transcoding.
Copy Code code as follows:
<?php
function safeencoding ($string, $outEncoding = ' UTF-8 ')
{
$encoding = "UTF-8";
for ($i =0; $i <strlen ($string); $i + +)
{
if (Ord ($string {$i}) <128)
Continue
if ((Ord ($string {$i}) &224) ==224)
{
The first byte is judged by
$char = $string {+ + $i};
if ((Ord ($char) &128) ==128)
{
The second byte is judged by
$char = $string {+ + $i};
if ((Ord ($char) &128) ==128)
{
$encoding = "UTF-8";
Break
}
}
}
if ((Ord ($string {$i}) &192) ==192)
{
The first byte is judged by
$char = $string {+ + $i};
if ((Ord ($char) &128) ==128)
{
The second byte is judged by
$encoding = "GB2312";
Break
}
}
}
if (Strtoupper ($encoding) = = Strtoupper ($outEncoding))
return $string;
Else
Return Iconv ($encoding, $outEncoding, $string);
}
?>