Php uses strlen () to determine the length of a Chinese character string

Source: Internet
Author: User
Tags function definition strlen

We often use the strlen () function in PHP to understand the length of a string. The usage is as follows:

PHP strlen () function definition and usage

The strlen () function returns the length of the string.

Syntax

Strlen (string)

Parameter: string
Description: required. Specifies the string to be checked.

Instance

The code is as follows: Copy code

<? Php
Echo strlen ("Hello world! ");
?>

Output: 12

How to handle Chinese

PHP built-in functions such as strlen () and mb_strlen () calculate the number of bytes occupied by the string to calculate the length of the string. An English character occupies 1 byte. Example:

$

The code is as follows: Copy code
EnStr = 'Hello, China! ';
Echo strlen ($ enStr); // output: 12

Chinese is not the case. For Chinese websites, two types of codes are generally used: gbk/gb2312 or UTF-8. UTF-8 is compatible with more characters, so it is favored by many webmasters. Gbk and UTF-8 are different in Chinese encoding, which leads to differences in the number of bytes occupied by gbk and UTF-8 encoding.

Each Chinese character occupies 2 bytes in gbk encoding, for example:

The code is as follows: Copy code

$ ZhStr = 'Hello, China! ';
Echo strlen ($ zhStr); // output: 12

Each Chinese character occupies 3 bytes in UTF-8 encoding, for example:

The code is as follows: Copy code

$ ZhStr = 'Hello, China! ';
Echo strlen ($ zhStr); // output: 18

So how can we calculate the length of this set of Chinese strings? Some people may say that the length of a Chinese string obtained in gbk is divided by 2. Is it okay to divide it by 3 in UTF-8 encoding? However, you need to consider that the string is not honest, and 99% of the cases will appear in a mix of Chinese and English.

This is a piece of code in WordPress. The main idea is to break down the string into individual units using regular expressions, and then calculate the number of units, that is, the length of the string. The code is as follows (only UTF-8 encoded strings can be processed):

The code is as follows: Copy code

$ ZhStr = 'Hello, China! ';
$ Str = 'Hello, China! ';

// Calculate the length of a Chinese string
Function utf8_strlen ($ string = null ){
// Splits the string into units.
Preg_match_all ("/./us", $ string, $ match );
// Returns the number of units.
Return count ($ match [0]);
}

Echo utf8_strlen ($ zhStr); // output: 6
Echo utf8_strlen ($ str); // output: 9

The following is a supplement: the difference between accurately calculating the number of characters and calculating the number of bytes

The code is as follows: Copy code
/Assume that the current page is encoded as GBK
<? Php
$ Str = "CHINA abc ";
Echo strlen ($ str); // returns 7. Because GBK encodes two Chinese bytes, strlen is the length of the returned string.
Echo "Echo iconv_strlen ($ str, "GBK"); // return 5. iconv_strlen is the number of characters in the statistical string.
?>
Or write it like this.
<? Php
$ Biaoti = "People's Republic of China ";
$ Zijie = strlen ($ biaoti );
Echo $ zijie. "<br>"; // <br> line feed
?>
// Assume that the current page is encoded as a UTF-8
<? Php
$ Str = "CHINA abc ";
Echo strlen ($ str); // return 9, because the UTF-8 encodes three bytes in each Chinese, strlen is the length of the byte returned by the string.
Echo "Echo iconv_strlen ($ str, "UTF-8"); // return 5. iconv_strlen is the number of characters in the statistical string
?>
Iconv_strlen can calculate the exact number of characters regardless of the encoding.
// Assume that the current page is encoded as GBK
<? Php
$ Str = "CHINA abc ";
Echo strlen ($ str); // returns 7. Because GBK encodes two Chinese bytes, strlen is the length of the returned string.
Echo "Echo iconv_strlen ($ str, "GBK"); // return 5. iconv_strlen is the number of characters in the statistical string.
?>
// Assume that the current page is encoded as a UTF-8
<? Php
$ Str = "CHINA abc ";
Echo strlen ($ str); // return 9, because the UTF-8 encodes three bytes in each Chinese, strlen is the length of the byte returned by the string.
Echo "Echo iconv_strlen ($ str, "UTF-8"); // return 5. iconv_strlen is the number of characters in the statistical string
?>

Iconv_strlen can calculate the exact number of characters regardless of the encoding.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.