Compile a PHP program to check the instance sharing of the number of Chinese characters in the string,
Sometimes we need to calculate the number of words contained in a string. For a pure English string, the number of words equals the length of the string.
The strlen function can be obtained, but what if the string contains Chinese characters? Mb_strlen can be implemented, but unfortunately the extension is not installed, so implement it by yourself.
Php has an extension which is generally required. We can use mb_strlen to obtain the characters in the string. The usage is generally as follows:
$ Len = mb_strlen ("you are my little apple", "UTF-8 ");
Expected String Length: 7.
What if the mb extension is not installed? Implement it by yourself.
We need to first understand the fact that a string is composed of characters, and characters are represented by Bytes. Each English character is a byte and corresponds to an ascii code, the ascii code of English characters is less than 128, that is, the hexadecimal 0x80. when the ascii code of a byte exceeds 127, it indicates that the current byte is not a complete character.
For example
$ Str = "you are my little apple ";
$ Str {0} In can get the first byte. Let's take a look at what it is:
Php> $ str = "you are my little apple"; php> echo $ str {0}; ��
It's garbled, it's just
Copy codeThe Code is as follows: You
One of the characters, that is,
Copy codeThe Code is as follows: You
This character is composed of more than one byte. Let's try it like this:
php > echo $str{0}.$str{1}.$str{2};
You
It can be seen that the output of three bytes is complete.
Copy codeThe Code is as follows: You
.
Why is it three bytes instead of two or four? This depends on the encoding of the string. The console uses UTF-8 encoding by default. in PHP, A UTF-8 character is expressed in three bytes. If it is gbk encoding, it will be two bytes. As for the relationship between encoding and bytes, this topic is quite large. For more information, see this article: character encoding notes: ascii, unicode, and utf8.
With this knowledge, we can compile a word check function by ourselves. The general process is as follows:
1. for Loop traversal byte 2. Determine whether the byte encoding is greater than or equal to 0x80. If yes, skip N Bytes.
I wrote a simple function to determine the length of the gbk or utf8 string. It is for reference only:
<? Phpfunction mbstrlen ($ str, $ encoding = "utf8") {if ($ len = strlen ($ str) = 0) {return 0 ;} $ encoding = strtolower ($ encoding); if ($ encoding = "utf8" or $ encoding = "UTF-8") {$ step = 3 ;} elseif ($ encoding = "gbk" or $ encoding = "gb2312") {$ step = 2;} else {return false;} $ count = 0; for ($ I = 0; $ I <$ len; $ I ++) {$ count ++; // If the bytecode is greater than 127, skip several bytes based on encoding. if (ord ($ str {$ I})> = 0x80) {$ I = $ I + $ step-1; // The reason for subtracting 1 is that the for Loop itself requires $ I ++} return $ count;} echo mbstrlen (iconv ("UTF-8", "gbk ", "You are my little apple"), "gbk"); echo mbstrlen ("you are my little apple ");
Articles you may be interested in:
- PHP Echo string connection format
- PHP uses arrays to replace matching items in strings in sequence
- How to Use the substr_count () function in PHP to obtain the occurrence times of a substring
- The strncmp () function in PHP compares the first two characters of two strings to determine whether they are equal.
- In PHP, The strnatcmp () function "natural sorting algorithm" is used to analyze the usage of string comparison (compared with the strcmp function)
- Comparison and usage of strcmp () and strcasecmp () functions in PHP
- Analysis on the use of substr function string Truncation in PHP
- In PHP, addslashes () and stripslashes () are used for String Conversion and restoration.
- PHP encryption and decryption string function source code download
- PHP specifies an instance to share a string that contains Chinese and English characters or numbers.