This article summarizes the common methods of whether the PHP detection string is UTF8 encoded. Share to everyone for your reference. The implementation methods are as follows:
There are a number of ways to detect string encoding, such as using Ord to get characters into the system and then enter judgment, or use the mb_detect_encoding function to deal with, the following collation of four common methods for everyone to refer to.
Example 1
Copy Code code as follows:
/**
* Detect whether the string is UTF8 encoded
* @param string $STR detected strings
* @return Boolean
*/
function Is_utf8 ($STR) {
$len = strlen ($STR);
for ($i = 0; $i < $len; $i + +) {
$c = Ord ($str [$i]);
if ($c > 128) {
if ($c > 247) return false;
ElseIf ($c > 239) $bytes = 4;
ElseIf ($c > 223) $bytes = 3;
ElseIf ($c > 191) $bytes = 2;
else return false;
if (($i + $bytes) > $len) return false;
while ($bytes > 1) {
$i + +;
$b = Ord ($str [$i]);
if ($b < 128 | | | $b > 191) return false;
$bytes--;
}
}
}
return true;
}
Example 2
Copy Code code as follows:
function Is_utf8 ($string) {
Return Preg_match ('%^:
[\x09\x0a\x0d\x20-\x7e] # ASCII
| [\XC2-\XDF] [\X80-\XBF] # Non-overlong 2-byte
| \XE0[\XA0-\XBF][\X80-\XBF] # excluding overlongs
| [\xe1-\xec\xee\xef] [\X80-\XBF] {2} # straight 3-byte
| \XED[\X80-\X9F][\X80-\XBF] # excluding surrogates
| \XF0[\X90-\XBF][\X80-\XBF]{2} # Planes 1-3
| [\xf1-\xf3] [\X80-\XBF] {3} # planes 4-15
| \XF4[\X80-\X8F][\X80-\XBF]{2} # Plane 16
) *$%xs ', $string);
}
The accuracy rate is basically the same as mb_detect_encoding (), to be right together, to be wrong together.
Code detection can not be 100% accurate, this thing has been able to basically meet the requirements.
Example 3
Copy Code code as follows:
function Mb_is_utf8 ($string)
{
Return mb_detect_encoding ($string, ' UTF-8 ') = = ' UTF-8 '/new discovery
}
Example 4
Copy Code code as follows:
Returns true if $string is valid UTF-8 and False otherwise.
function Is_utf8 ($word)
{
if (Preg_match ("/^. Chr (228)." -". Chr (233)."] {1} [". chr (128)." -". Chr (191)."] {1} [". chr (128)." -". Chr (191)."] {1}) {1}/", $word) = = True | | Preg_match ("/([). chr (228)." -". Chr (233)."] {1} [". chr (128)." -". Chr (191)."] {1} [". chr (128)." -". Chr (191)."] {1}) {1}$/", $word) = = True | | Preg_match ("/([). chr (228)." -". Chr (233)."] {1} [". chr (128)." -". Chr (191)."] {1} [". chr (128)." -". Chr (191)."] {1}) {2,}/", $word) = = True)
{
return true;
}
Else
{
return false;
}
}//Function Is_utf8
I hope this article will help you with your PHP program design.