Compile a PHP program to check the number of Chinese characters in the string for instance sharing

Source: Internet
Author: User
Sometimes we need to calculate the number of words contained in a string. For a pure English string, the number of words equals the length of the string, which can be obtained using the strlen function. But what if the string contains Chinese characters? Mb_strlen can be implemented, but unfortunately the extension is not installed, so implement it by yourself. Php has an extension which is generally required. We can use mb_strlen to obtain

Sometimes we need to calculate the number of words contained in a string. For a pure English string, the number of words equals the length of the string, which can be obtained using the strlen function. But what if the string contains Chinese characters? Mb_strlen can be implemented, but unfortunately the extension is not installed, so implement it by yourself. Php has an extension which is generally required. We can use mb_strlen to obtain

Sometimes we need to calculate the number of words contained in a string. For a pure English string, the number of words equals the length of the string.
The strlen function can be obtained, but what if the string contains Chinese characters? Mb_strlen can be implemented, but unfortunately the extension is not installed, so implement it by yourself.

Php has an extension which is generally required. We can use mb_strlen to obtain the characters in the string. The usage is generally as follows:

$ Len = mb_strlen ("you are my little apple", "UTF-8 ");


Expected String Length: 7.

What if the mb extension is not installed? Implement it by yourself.

We need to first understand the fact that a string is composed of characters, and characters are represented by Bytes. Each English character is a byte and corresponds to an ascii code, the ascii code of English characters is less than 128, that is, the hexadecimal 0x80. when the ascii code of a byte exceeds 127, it indicates that the current byte is not a complete character.

For example

$ Str = "you are my little apple ";

$ Str {0} In can get the first byte. Let's take a look at what it is:

Php> $ str = "you are my little apple"; php> echo $ str {0}; ��

It's garbled, it's just

The Code is as follows:

You


One of the characters, that is,

The Code is as follows:

You


This character is composed of more than one byte. Let's try it like this:

php > echo $str{0}.$str{1}.$str{2};

You
It can be seen that the output of three bytes is complete.

The Code is as follows:

You


.

Why is it three bytes instead of two or four? This depends on the encoding of the string. The console uses UTF-8 encoding by default. in PHP, A UTF-8 character is expressed in three bytes. If it is gbk encoding, it will be two bytes. As for the relationship between encoding and bytes, this topic is quite large. For more information, see this article: character encoding notes: ascii, unicode, and utf8.

With this knowledge, we can compile a word check function by ourselves. The general process is as follows:

1. for Loop traversal byte 2. Determine whether the byte encoding is greater than or equal to 0x80. If yes, skip N Bytes.

I wrote a simple function to determine the length of the gbk or utf8 string. It is for reference only:

<? Phpfunction mbstrlen ($ str, $ encoding = "utf8") {if ($ len = strlen ($ str) = 0) {return 0 ;} $ encoding = strtolower ($ encoding); if ($ encoding = "utf8" or $ encoding = "UTF-8") {$ step = 3 ;} elseif ($ encoding = "gbk" or $ encoding = "gb2312") {$ step = 2;} else {return false;} $ count = 0; for ($ I = 0; $ I <$ len; $ I ++) {$ count ++; // If the bytecode is greater than 127, skip several bytes based on encoding. if (ord ($ str {$ I})> = 0x80) {$ I = $ I + $ step-1; // The reason for subtracting 1 is that the for Loop itself requires $ I ++} return $ count;} echo mbstrlen (iconv ("UTF-8", "gbk ", "You are my little apple"), "gbk"); echo mbstrlen ("you are my little apple ");

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.