Use PHP to determine if the file is UTF-8 encoded (check Bom ). UTF-8 encoding files are divided into two types with Bom and without Bom, with Bom are easy to handle, without Bom will be a bit of trouble, so write a function to judge, the code is as follows: the returned UTF-8 encoding files are divided into two types, with Bom and without Bom, with Bom is easy to handle, without Bom will be a bit of trouble, therefore, I wrote a function to judge the code. the code is as follows:
// Returns 1, indicating that all characters are not greater than 127)
// 2 indicates UTF8
// Return 0, indicating normal gb encoding
Function TestUtf8 ($ text)
{
If (strlen ($ text) <3) return false;
$ Lastch = 0;
$ Begin = 0;
$ BOM = true;
$ BOMchs = array (0xEF, 0xBB, 0xBF );
$ Good = 0;
$ Bad = 0;
$ NotAscii = 0;
For ($ I = 0; $ I <strlen ($ text); $ I ++)
{
$ Ch = ord ($ text [$ I]);
If ($ begin <3)
{
$ BOM = ($ BOMchs [$ begin] = $ ch );
$ Begin + = 1;
Continue;
}
If ($ begin = 4 & $ BOM) break;
If ($ ch> = 0x80) $ notAscii ++;
If ($ ch & 0xC0) = 0x80)
{
If ($ lastch & 0xC0) = 0xC0)
{
$ Good + = 1;
}
Else if ($ lastch & 0x80) = 0)
{
$ Bad + = 1;
}
}
Else if ($ lastch & 0xC0) = 0xC0)
{
$ Bad + = 1;
}
$ Lastch = $ ch;
}
If ($ begin = 4 & $ BOM)
{
Return 2;
}
Else if ($ notAscii = 0)
{
Return 1;
}
Else if ($ good >=$ bad)
{
Return 2;
}
Else
{
Return 0;
}
}
Returns...