Weibo's speech has the word limit, its counting way is, the Chinese count 2, the English Count 1, the full-width character counts 2, the half-width character counts 1.
PHP with Strlen is the number of bytes returned, for UTF8 encoded Chinese back when 3, does not meet the requirements.
Mb_strlen can calculate the length according to the character set, for example, the Chinese count of UTF8 is 1, but this does not conform to the micro-blog word limit needs, Chinese must be calculated as 2.
Google went on to find a discuz to intercept a variety of encoded characters in the class, modified, has been tested through. The parameter $charset only supports GBK and Utf-8.
Copy Code code as follows:
$a = "s@@ 你好";
Var_dump (Strlen_weibo ($a, ' utf-8 '));
The result output is 8, where the letter S count is 1, the Full-width @ count is 2, the Half-width @ count is 1, and the two Chinese count is 4. The source code is as follows:
Copy Code code as follows:
function Strlen_weibo ($string, $charset = ' utf-8 ')
{
$n = $count = 0;
$length = strlen ($string);
if (Strtolower ($charset) = = ' Utf-8 ')
{
while ($n < $length)
{
$currentByte = Ord ($string [$n]);
if ($currentByte = = 9 | |
$currentByte = 10 | |
(<= $currentByte && $currentByte <= 126))
{
$n + +;
$count + +;
} elseif (194 <= $currentByte && $currentByte <= 223)
{
$n + 2;
$count + 2;
} elseif (224 <= $currentByte && $currentByte <= 239)
{
$n + 3;
$count + 2;
ElseIf (<= $currentByte && $currentByte <= 247)
{
$n + 4;
$count + 2;
} elseif (248 <= $currentByte && $currentByte <= 251)
{
$n + 5;
$count + 2;
ElseIf ($currentByte = = 252 | | $currentByte = = 253)
{
$n + 6;
$count + 2;
} else
{
$n + +;
$count + +;
}
if ($count >= $length)
{
Break
}
}
return $count;
} else
{
for ($i = 0; $i < $length; $i + +)
{
if (Ord ($string [$i]) > 127)
{
$i + +;
$count + +;
}
$count + +;
}
return $count;
}
}