PHP to determine the Chinese and encoding about GBK is double-byte, UTF8 is three bytes, can be judged according to the range of Chinese
encoding range 1. GBK (gb2312/gb18030)
\x00-\xff GBK Double byte encoding range
\x20-\x7f ASCII
\xa1-\xff Chinese
\x80-\xff Chinese
2. UTF-8 (Unicode)
\U4E00-\U9FA5 (English)
\x3130-\x318f (Korean
\XAC00-\XD7A3 (Korean)
\u0800-\u4e00 (Japanese)
PS: Korean is more than [\u9fa5] characters
The regular example:
Preg_replace ("/([\x80-\xff])/", "", $str);
Preg_replace ("/([U4E00-U9FA5])/", "", $str);
Second, the code example
The code is as follows:
There is no Chinese-gbk (PHP) function Check_is_chinese ($s) {return preg_match ('/[\x80-\xff]./', $s);}//Get string length-GBK (PHP) function Gb_strlen ($str) {$count = 0; for ($i =0; $i <strlen ($STR); $i + +) {$s = substr ($str, $i, 1); if (Preg_match 0-\xff]/", $s)) + + $i; + + $count; } return $count; }//Intercept string Strings-GBK (PHP) function gb_substr ($STR, $len) {$count = 0; for ($i =0; $i <strlen ($STR); $i + +) {if ($count = = $len) Break if (Preg_match ("/[\x80-\xff]/", substr ($str, $i, 1)) + + $i; + + $count; } return substr ($str, 0, $i); }//Statistic string length-utf8 (PHP) function Utf8_strlen ($str) {$count = 0; for ($i = 0; $i < strlen ($STR); $i + +) {$value = Ord ($str [$i]); if ($value > 127) {$count + +, if ($value >= 192 && $value <= 223) $i + +, ElseIf ($value >= 224 && $value <= 239) $i = $i + 2; ElseIf ($value >= && $value <= 247) $i = $i + 3; Else die (' Not a UTF-8 compatible string '); } $count + +; } return $count; }//Intercept string-utf8 (PHP) function utf8_substr ($str, $position, $length){$start _position = strlen ($STR); $start _byte = 0; $end _position = strlen ($str); $count = 0; for ($i = 0; $i < strlen ($s TR); $i + +) {if ($count >= $position && $start _position > $i) {$start _position = $i; $start _byte = $count;} if ($c ount-$start _byte) >= $length) {$end _position = $i; break;} $value = Ord ($str [$i]); if ($value > 127) {$count + +, if ($value >= 192 && $value <= 223) $i + +, ElseIf ($value >= 224 && $ Value <= 239) $i = $i + 2; ElseIf ($value >= && $value <= 247) $i = $i + 3; Else die (' Not a UTF-8 compatible string '); } $count + +; } return (substr ($str, $start _position, $end _position-$start _position)); }//Determine if there is a Korean-utf-8 (JavaScript) function Checkkoreachar (str) {for (i=0; i<str.length; i++) {if ((Str.charcodeat (i) &G T 0x3130 && str.charcodeat (i) < 0x318f) | | (Str.charcodeat (i) >= 0xac00 && str.charcodeat (i) <= 0xd7a3))) {return true;} } return false; }//To determine if there is a Chinese character-gbk (JavaScript) functIon Check_chinese_char (s) {return (S.length! = S.replace (/[^\x00-\xff]/g, "* *"). length);}