One, the coding range
1. GBK (gb2312/gb18030)
X00-xff GBK Double byte encoding range
x20-x7f ASCII
Xa1-xff Chinese
X80-xff Chinese
2. UTF-8 (Unicode)
U4E00-U9FA5 (English)
x3130-x318f (Korean)
XAC00-XD7A3 (Korean)
u0800-u4e00 (Japanese)
PS: Korean is more than [U9FA5] characters
The regular example:
Preg_replace ("/([X80-xff])/", "", $str);
Preg_replace ("/([U4E00-U9FA5])/", "", $str);
Second, the code example
There is no Chinese-GBK (PHP) in the judging content
function Check_is_chinese ($s) {
Return Preg_match (/[x80-xff]./, $s);
}
Get string length-GBK (PHP)
function Gb_strlen ($STR) {
$count = 0;
for ($i =0; $i
$s = substr ($str, $i, 1);
if (Preg_match ("/[x80-xff]/", $s)) + + $i;
+ + $count;
}
return $count;
}
Intercepting string Strings-GBK (PHP)
function Gb_substr ($STR, $len) {
$count = 0;
for ($i =0; $i
if ($count = = $len) break;
if (Preg_match ("/[x80-xff]/", substr ($str, $i, 1)) + + $i;
+ + $count;
}
Return substr ($str, 0, $i);
}
Statistical string length-utf8 (PHP)
function Utf8_strlen ($STR) {
$count = 0;
for ($i = 0; $i < strlen ($STR); $i + +) {
$value = Ord ($str [$i]);
if ($value > 127) {
$count + +;
if ($value >= 192 && $value <= 223) $i + +;
ElseIf ($value >= 224 && $value <= 239) $i = $i + 2;
ElseIf ($value >= && $value <= 247) $i = $i + 3;
Else die (not a UTF-8 compatible string);
}
$count + +;
}
return $count;
}
Intercept string-utf8 (PHP)
function Utf8_substr ($str, $position, $length) {
$start _position = strlen ($STR);
$start _byte = 0;
$end _position = strlen ($STR);
$count = 0;
for ($i = 0; $i < strlen ($STR); $i + +) {
if ($count >= $position && $start _position > $i) {
$start _position = $i;
$start _byte = $count;
}
if (($count-$start _byte) >= $length) {
$end _position = $i;
Break
}
$value = Ord ($str [$i]);
if ($value > 127) {
$count + +;
if ($value >= 192 && $value <= 223) $i + +;
ElseIf ($value >= 224 && $value <= 239) $i = $i + 2;
ElseIf ($value >= && $value <= 247) $i = $i + 3;
Else die (not a UTF-8 compatible string);
}
$count + +;
}
Return (substr ($str, $start _position, $end _position-$start _position));
}
String length statistics-utf8 [Chinese 3 bytes, Russian, Korean accounted for 2 bytes, 1 bytes in Alphabet] (Ruby)
def utf8_string_length (str)
temp = Cgi::unescape (str)
i = 0;
j = 0;
Temp.length.times{|t|
If TEMP[T] < 127
i + = 1
ElseIf Temp[t] >= 127 and temp[t] < 224
J + = 1
If 0 = = (j% 2)
i + = 2
j = 0
End
Else
J + = 1
If 0 = = (j% 3)
I +=2
j = 0
End
End
}
return I
}
Determine if a Korean-utf-8 (JavaScript) is included
function Checkkoreachar (str) {
for (i=0; i
if (((Str.charcodeat (i) > 0x3130 && str.charcodeat (i) < 0x318f) | | (Str.charcodeat (i) >= 0xac00 && str.charcodeat (i) <= 0xd7a3))) {
return true;
}
}
return false;
}
Determine if there is a Chinese character-gbk (JavaScript)
function Check_chinese_char (s) {
Return (S.length! = S.replace (/[^x00-xff]/g, "* *"). length);
}
http://www.bkjia.com/PHPjc/486508.html www.bkjia.com true http://www.bkjia.com/PHPjc/486508.html techarticle One, the coding range 1. GBK (gb2312/gb18030) x00-xff GBK Double byte encoding range x20-x7f ASCII xa1-xff Chinese x80-xff Chinese 2. UTF-8 (Unicode) u4e00-u9fa5 (Chinese) x3130-x318f (Korean) ...