UTF-8 encoded files are divided into BOM and without BOM two kinds, with the BOM everyone is easy to handle, without BOM will be a little trouble, so write a function to judge, the code is as follows:
Returns 1 for pure ASCII (that is, all characters are not greater than 127)
Returns 2 indicating UTF8
Return 0 indicates normal GB encoding
function TestUtf8 ($text)
{
if (strlen ($text) < 3) return false;
$lastch = 0;
$begin = 0;
$BOM = true;
$BOMchs = Array (0xEF, 0xBB, 0xBF);
$good = 0;
$bad = 0;
$notAscii = 0;
for ($i =0; $i < strlen ($text); $i + +)
{
$ch = Ord ($text [$i]);
if ($begin < 3)
{
$BOM = ($BOMchs [$begin]== $ch);
$begin + = 1;
Continue
}
if ($begin ==4 && $BOM) break;
if ($ch >= 0x80) $notAscii + +;
if (($ch &0xc0) = = 0x80)
{
if (($lastch &0xc0) = = 0xC0)
{
$good + = 1;
}
else if (($lastch &0x80) = = 0)
{
$bad + = 1;
}
}
else if (($lastch &0xc0) = = 0xC0)
{
$bad + = 1;
}
$lastch = $ch;
}
if ($begin = = 4 && $BOM)
{
return 2;
}
else if ($notAscii ==0)
{
return 1;
}
else if ($good >= $bad)
{
return 2;
}
Else
{
return 0;
}
}
http://www.bkjia.com/PHPjc/364705.html www.bkjia.com true http://www.bkjia.com/PHPjc/364705.html techarticle UTF-8 encoded files are divided into BOM and without BOM two kinds, with the BOM is easy to handle, without BOM will be a little trouble, so write a function to judge, the code is as follows://return ...