encoding range 1. GBK (gb2312/gb18030)
\x00-\xff GBK Double byte encoding range
\x20-\x7f ASCII
\xa1-\xff Chinese
\x80-\xff Chinese
2. UTF-8 (Unicode)
\U4E00-\U9FA5 (English)
\x3130-\x318f (Korean
\XAC00-\XD7A3 (Korean)
\u0800-\u4e00 (Japanese)
PS: Korean is more than [\u9fa5] characters
The regular example:
Preg_replace ("/([\x80-\xff])/", "", $str);
Preg_replace ("/([U4E00-U9FA5])/", "", $str);
Second, the code example
Copy CodeThe code is as follows:
There is no Chinese-GBK (PHP) in the judging content
function Check_is_chinese ($s) {
Return Preg_match ('/[\x80-\xff]./', $s);
}
Get string length-GBK (PHP)
function Gb_strlen ($STR) {
$count = 0;
for ($i =0; $i $s = substr ($str, $i, 1);
if (Preg_match ("/[\x80-\xff]/", $s)) + + $i;
+ + $count;
}
return $count;
}
Intercepting string Strings-GBK (PHP)
function Gb_substr ($STR, $len) {
$count = 0;
for ($i =0; $i if ($count = = $len) break;
if (Preg_match ("/[\x80-\xff]/", substr ($str, $i, 1)) + + $i;
+ + $count;
}
Return substr ($str, 0, $i);
}
Statistical string length-utf8 (PHP)
function Utf8_strlen ($STR) {
$count = 0;
for ($i = 0; $i < strlen ($STR); $i + +) {
$value = Ord ($str [$i]);
if ($value > 127) {
$count + +;
if ($value >= 192 && $value <= 223) $i + +;
ElseIf ($value >= 224 && $value <= 239) $i = $i + 2;
ElseIf ($value >= && $value <= 247) $i = $i + 3;
Else die (' Not a UTF-8 compatible string ');
}
$count + +;
}
return $count;
}
Intercept string-utf8 (PHP)
function Utf8_substr ($str, $position, $length) {
$start _position = strlen ($STR);
$start _byte = 0;
$end _position = strlen ($STR);
$count = 0;
for ($i = 0; $i < strlen ($STR); $i + +) {
if ($count >= $position && $start _position > $i) {
$start _position = $i;
$start _byte = $count;
}
if (($count-$start _byte) >= $length) {
$end _position = $i;
Break
}
$value = Ord ($str [$i]);
if ($value > 127) {
$count + +;
if ($value >= 192 && $value <= 223) $i + +;
ElseIf ($value >= 224 && $value <= 239) $i = $i + 2;
ElseIf ($value >= && $value <= 247) $i = $i + 3;
Else die (' Not a UTF-8 compatible string ');
}
$count + +;
}
Return (substr ($str, $start _position, $end _position-$start _position));
}
Determine if there is a Korean-utf-8 (JavaScript)
function Checkkoreachar (str) {
for (i=0; i if (((Str.charcodeat (i) > 0x3130 && str.charcodeat (i) < 0x318f) | | (Str.charcodeat (i) >= 0xac00 && str.charcodeat (i) <= 0xd7a3))) {
return true;
}
}
return false;
}
Determine if there is a Chinese character-gbk (JavaScript)
function Check_chinese_char (s) {
Return (S.length! = S.replace (/[^\x00-\xff]/g, "* *"). length);
}
http://www.bkjia.com/PHPjc/321937.html www.bkjia.com true http://www.bkjia.com/PHPjc/321937.html techarticle encoding range 1. GBK (gb2312/gb18030) \x00-\xff GBK Double byte encoding range \x20-\x7f ASCII \xa1-\xff Chinese \x80-\xff Chinese 2. UTF-8 (Unicode) \u4e00-\u9fa5 (Chinese) \x3130-\x318f (Korean ...