Differences between strlen, mb_strlen, substr (), mb_substr () and mb_strcut in php

Source: Internet
Author: User

This article describes in detail the differences and usage of strlen, mb_strlen, substr (), mb_substr (), and mb_strcut. For more information, see.

Use the following functions to split the mb _ * string:
Configuration in win
Install php_mbstring.dll Extension
You need to open php_mbstring.dll in php. ini.
The configuration in linux can be searched online.

The Code is as follows: Copy code

<? Php
// During the test, the file encoding method is UTF8.
$ Str = 'Chinese character a 1 ';
Echo strlen ($ str). '<br>'; // 14
Echo mb_strlen ($ str, 'utf8'). '<br>'; // 6
Echo mb_strlen ($ str, 'gbk'). '<br>'; // 8
Echo mb_strlen ($ str, 'gb2312'). '<br>'; // 10
?>


Result Analysis: During strlen calculation, the Chinese character of UTF8 is 3 characters in length, so the length of "Chinese character a 1 character" is 3*4 + 2 = 14, when mb_strlen is calculated, if the selected inner code is UTF8, a Chinese character is regarded as 1 in length. Therefore, the length of "Chinese a character 1 character" is 6.

Mb_strlen can be encoded by default through

Mb_internal_encoding () is obtained.

These two functions can be used together to calculate the placeholder value of a mix of Chinese and English strings (The placeholder value of a Chinese character is 2, and the English character is

1 echo (strlen ($ str) + mb_strlen ($ str, 'utf8')/2;

The built-in String Length function strlen in PHP cannot properly process Chinese strings. It only obtains the number of bytes occupied by strings. For GB2312 Chinese encoding, strlen obtains two times the number of Chinese characters, and for the UTF-8 encoding of Chinese, is three times the difference (in the UTF-8 encoding, A Chinese Character occupies 3 bytes ).

String segmentation
The substr () function can split text. However, if the text to be split contains Chinese characters, the function mb_substr ()/mb_strcut can be used.

Mb_substr is used to split characters by words, while mb_strcut is used to split characters by bytes, but it does not produce half a character.


The substr () function can split text. However, if the text to be split contains Chinese characters, you can use the mb_substr ()/mb_strcut function, mb_substr () the usage of/mb_strcut is similar to that of substr (), but an additional parameter must be added at the end of mb_substr ()/mb_strcut to set the character string encoding. However, php_mbstring.dll is not enabled on the server, in php. ini opens php_mbstring.dll.

For example:

The Code is as follows: Copy code
<? Php
Echo mb_substr ('in this way, my strings will not contain garbled characters such as ^_^', 0, 7, 'utf-8 ');
?>

Output: in this way, my words

The Code is as follows: Copy code
<? Php
Echo mb_strcut ('in this way, my strings will not contain garbled characters such as ^_^', 0, 7, 'utf-8 ');
?>


Output:
From the above example, we can see that mb_substr is a word-based splitting character, while mb_strcut is a byte-based splitting character, but it does not produce a half character ......

Description of the mbstring function:


The mbstring extension module of php provides the processing capability of Multi-byte characters. The most common feature is to use mbstring to split multi-byte Chinese characters, which can avoid the occurrence of half a character, because it is an extension of php, its performance is better than some custom multibyte splitting functions.

Mbstring extension provides several function-like functions, mb_substr and mb_strcut. Let's take a look at their explanations in the manual.

Mb_substr
Mb_substr () returns the portion of str specified by the start and length parameters.

Mb_substr () performs multi-byte safe substr () operation based on number of characters. position is sqlserver/42852.htm target = _ blank> counted from the beginning of str. first character's position is 0. second character position is 1, and so on.

Mb_strcut
Mb_strcut () returns the portion of str specified by the start and length parameters.

Mb_strcut () performs equivalent operation as mb_substr () with different method. If start position is multi-byte character's second byte or larger, it starts from first byte of multi-byte character.

It subtracts string from str that is shorter than length AND character that is not part of multi-byte string or not being middle of shift sequence.

For another example, we use mb_substr and mb_strcut to split a piece of text:

PLAIN TEXT
CODE:

The Code is as follows: Copy code

<? Php
$ Str = 'I am a long string of Chinese characters -www.webjx.com ';

Echo "mb_substr:". mb_substr ($ str, 0, 6, 'utf-8 ');

Echo "<br> ";

Echo "mb_strcut:". mb_strcut ($ str, 0, 6, 'utf-8 ');
?>

The output result is as follows:

Mb_substr: I am a comparison string
Mb_strcut: I am

Test code:

The Code is as follows: Copy code

/**
* String segmentation by word
* @ Param $ content string
* @ Param $ length int
* @ Param $ etc string
* @ Return string
*/
Function Truncate ($ content, $ length, $ etc = '...'){

If ($ length = 0 ){
Return '';
} Elseif (mb_strlen ($ content, 'utf-8')> $ length ){
$ Length-= min ($ length, mb_strlen ($ etc ));
$ Charset = 'utf-8 ';
$ Content = mb_substr ($ content, 0, $ length, $ charset). $ etc;
}
Return $ content;
}

$ Str = 'voltaire (1694 ~ 1778) French bourgeois enlightened thinker, philosopher, historian, and writer. He was originally named F. M. Arua. ';

Echo strlen ($ str); // String Length
Echo 'Echo mb_strlen ($ str, 'utf-8'); // String Length
Echo 'Echo mb_strcut ($ str, 'utf-8'); // split by byte
Echo 'Echo mb_substr ($ str, 'utf-8'); // split by word
Echo 'Echo Truncate ($ str, 35); // string truncation method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.