To write a PHP program to check the number of Chinese characters in the string of instances to share, _php tutorial

Source: Internet
Author: User

Write a PHP program to check the number of Chinese characters in the string of instances to share,


Sometimes we need to calculate the number of words contained in a string, for pure English strings, the number of words equals the string length, with
The Strlen function is available, but what if the string contains Chinese? Mb_strlen can be achieved, but unfortunately did not install the expansion, then realize it yourself.

PHP has an extension is generally required, we can use Mb_strlen to get the word count in the string, the usage is generally as follows:

$len = Mb_strlen ("You Are my little Apple", "utf-8");


Get the string length as desired: 7.

What if the MB extension is not installed? Realize it yourself.

We must first understand the fact that the string is composed of characters, and the characters are represented by bytes, each English character is a byte, corresponding to an ASCII code, the English character ASCII code is less than 128, that is, the hexadecimal 0x80. When a byte of ASCII code exceeds 127, That means the current byte is not a full character.

Like what

$STR = "You are my little Apple";

In the $str{0} can take the first byte, let's see what it is:

php > $STR = "You are my little Apple";p hp > Echo $str {0};

It's a garbled, it's just
Copy the Code code as follows: you
Word of one of the bytes, i.e.,
Copy the Code code as follows: you
This character is made up of more than one byte, so let's try this:

php > Echo $str {0}. $str {1}. $str {2};

You
As you can see, by connecting three bytes together, the output is a complete
Copy the Code code as follows: you

As for why is this here three bytes instead of two or 4? This depends on the encoding of the string, I here the console is UTF8 encoded by default, in PHP, a UTF8 character is expressed in three bytes, if it is GBK encoded, it will be two bytes. As for the relationship between encoding and byte, this topic is relatively large, an article is not finished, please refer to this article: character encoding notes: Ascii,unicode and UTF8.

Knowing this, we can write a function of the word check itself, the approximate process is as follows:

1.for Loop Traversal byte 2. Determine if the byte encoding is >= 0x80, or skip n bytes

I have written a simple function that can judge the length of a gbk or UTF8 string, for reference only:

<?phpfunction Mbstrlen ($str, $encoding = "UTF8") {  if ($len = strlen ($str)) = = 0) {    return 0;  }  $encoding = Strtolower ($encoding);  if ($encoding = = "UTF8" or $encoding = = "Utf-8") {    $step = 3;  } elseif ($encoding = = "GBK" or $encoding = = "gb2312 ") {    $step = 2;  } else {    return false;  }  $count = 0;  for ($i =0; $i < $len; $i + +) {    $count + +;    If the bytecode is greater than 127, then skip a few bytes according to the encoding    if (ord ($str {$i}) >= 0x80) {      $i = $i + $step -1;//minus 1 because the For loop itself also $i++    }< c18/>}  return $count;} Echo Mbstrlen (Iconv ("Utf-8", "GBK", "You Are my Little Apple"), "GBK"); Echo Mbstrlen ("You Are my little Apple");

Articles you may be interested in:

    • The connection format of the PHP echo string
    • PHP uses arrays to replace occurrences of strings in turn
    • How the Substr_count () function Gets the number of substrings in PHP
    • PHP strncmp () function compares the first 2 characters of a two string method
    • STRNATCMP () function "Natural sorting algorithm" in PHP for string comparison usage analysis (compare strcmp function)
    • strcmp () and strcasecmp () function strings in PHP comparison usage analysis
    • Analysis of substr function string interception usage in PHP
    • Addslashes () and stripslashes () in PHP implement string escape and restore usage instances
    • Detailed PHP encryption decryption string function with source download
    • PHP Specifies that instances of Chinese and English or numeric characters in the Intercept string are shared

http://www.bkjia.com/PHPjc/1111912.html www.bkjia.com true http://www.bkjia.com/PHPjc/1111912.html techarticle To write a PHP program to check the number of Chinese characters in the string instance sharing, sometimes we need to calculate a string contains the word count, for pure English strings, the word count equals the string ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.