How PHP gets the length of a mixed string _php tips

Source: Internet
Author: User

Tonight, when writing a form validation class for a framework, you need to determine whether a string length is within a specified range, and naturally, think of the strlen function in PHP.

Copy Code code as follows:

$str = ' Hello world! ';
echo strlen ($STR); Output 12

However, in PHP's self-contained functions, strlen and Mb_strlen calculate the length by calculating the number of bytes in the string, and the number of bytes in Chinese is different in different encodings. Under gbk/gb2312, Chinese characters account for 2 bytes, while in UTF-8, Chinese characters are 3 bytes.
Copy Code code as follows:

$str = ' Hello, world! ';
echo strlen ($STR); GBK or GB2312 lower output 12,utf-8 under output 18

When we judge the length of a string, we often need to judge the number of characters, not the number of bytes in the string, such as the PHP code under UTF-8:
Copy Code code as follows:

$name = ' Zhang Ge Chang ';
$len = strlen ($name);
Output FALSE because three Chinese is 9 bytes under UTF-8
if ($len >= 3 && $len <= 8) {
Echo ' TRUE ';
}else{
Echo ' FALSE ';
}

So what is the convenient and practical way to get the length of a string containing Chinese? You can use regular to calculate the number of Chinese characters, under the gbk/gb2312 encoding divided by the 2,utf-8 code is divided by 3, and finally add the length of the non-Chinese string, but this is too much trouble.

WordPress Such a piece of code, for reference as follows:

Copy Code code as follows:

$str = ' Hello, world! ';
Preg_match_all ('/./us ', $str, $match);
echo Count ($match [0]); Output 9

The idea is to split a string into a single character with a regular expression and calculate the number of characters to match directly with Count, which is the result we want.

But the above code does not handle gbk/gb2312 's Chinese string under the UTF-8 code, because the gbk/gb2312 character will be recognized as two characters and the number of words in the calculation will double, so I thought of such a way:

Copy Code code as follows:

$tmp = @iconv (' GBK ', ' utf-8 ', $str);
if (!empty ($tmp)) {
$str = $tmp;
}
Preg_match_all ('/./us ', $str, $match);
echo Count ($match [0]);

Compatible with gbk/gb2312 and UTF-8 coding, through a small number of data testing, but has not been determined to be completely correct, look forward to Daniel Twos.

The above is meant to be compatible with a variety of coding formats, but generally in the day-to-day development of a project is already can be determined by what encoding, so you can use the following functions to easily get string length:

Copy Code code as follows:

int Iconv_strlen (String $str [, String $charset = Ini_get ("iconv.internal_encoding")])

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.