Chinese garbled characters occur when PHP substr () intercepts strings.

Source: Internet
Author: User
Tags character set php file truncated

However, as long as the character string appears Chinese characters, it may cause PHP substr Chinese garbled characters, because of the Chinese UTF-8 encoding, each Chinese character occupies 3 bytes, while GB2312 occupies 2 bytes, English occupies 1 byte, the number of truncated digits is not accurate. substr () roughly splits a Chinese character into two halves, resulting in the disconnection of the character following .. the PHP substr is garbled in Chinese.


Substr --- obtain some strings

Syntax: string substr (string, int start [, int length])
Note:
Substr () returns a string that is specified by the start and length parameters.
If start is a positive number, the returned string starts with the start character of the string.
Example:

The code is as follows: Copy code


<? Php
$ Rest = substr ("abcdef", 1); // returns "bcdef"
$ Rest = substr ("abcdef", 1, 3); // returns "bcd"
?>

If start is a negative number, the returned string starts with the start character at the end of the string.
Example:

The code is as follows: Copy code


<? Php
$ Rest = substr ("abcdef",-1); // returns "f"
$ Rest = substr ("abcdef",-2); // returns "ef"
$ Rest = substr ("abcdef",-3, 1); // returns "d"
?>

If the length parameter is given and it is a positive number, the returned string will return length characters from start.
If the length parameter is given and it is a negative number, the returned string ends with the nth length character ending with the string.
Example:

The code is as follows: Copy code


<? Php
$ Rest = substr ("abcdef", 1,-1); // returns "bcde"
?>

There is no problem with English. Let's test a Chinese character.

The code is as follows: Copy code

 

<? Php
$ Rest = substr ("Chinese", 1,-1); // returns "fdsafsda" is garbled
?>

The result of intercepting characters is definitely not what we want. In this case, PHP substr Chinese garbled characters may cause the program to fail to run normally. There are two solutions:


1. Use mb_substr () to intercept the extended library without garbled characters.

You can use the mb_substr ()/mb_strcut () function. The usage of mb_substr ()/mb_strcut () is similar to that of substr (), but only in mb_substr () /mb_strcut adds another parameter at the end to set the encoding of the string, but php_mbstring.dll is not opened on the general server. ini opens php_mbstring.dll.

The code is as follows: Copy code

<? Php
Echo mb_substr ("php Chinese character encode", "UTF-8 ");
?>


If the last encoding parameter is not specified, three bytes are used as a Chinese character. This is the feature of UTF-8 encoding. If the UTF-8 character set is added, it is captured in the unit of a word.


Pay attention to the PHP file encoding and the encoding when displaying the webpage. To use this mb_substr method, you need to know the encoding of the string in advance. If you do not know the encoding, you need to judge it. The mbstring library also provides mb_check_encoding to check the encoding of the string, but it is not complete yet.


PHP comes with several string truncation functions, including substr and mb_substr. When the former is processing Chinese characters, GBK is two length units, and UTF is three length units. After encoding is specified, a Chinese character is one length unit.

Substr sometimes intercepts 1/3 Chinese or half Chinese characters, and garbled characters are displayed. mb_substr is more suitable for use. However, sometimes mb_substr is not so easy to use. For example, I want to display the brief information of a small image. Five Chinese characters are exactly the same. If there are more than five Chinese characters, I will take the first four and add "...", It is okay to process Chinese in this way, but it is too short to process English or numbers.


2. Write the truncation function by yourself, but the efficiency is not as high as using the mbstring Extension Library. The following is the function of intercepting the string in the UTF-8 encoding in ecshop.

The code is as follows: Copy code

Function sub_str ($ str, $ length = 0, $ append = true)
{
$ Str = trim ($ str );
$ Strlength = strlen ($ str );
 
If ($ length = 0 | $ length> = $ strlength)
    {
Return $ str; // returns the string itself if the truncation length is equal to 0 or greater than or equal to the length of this string.
    }
Elseif ($ length <0) // if the truncation length is negative
    {
$ Length = $ strlength + $ length; // The truncation length is equal to the string length minus the truncation length.
If ($ length <0)
        {
$ Length = $ strlength; // if the absolute value of the truncated length is greater than the length of the string, the truncated length is the length of the string.
        }
    }
 
If (function_exists ('MB _ substr '))
    {
$ Newstr = mb_substr ($ str, 0, $ length, EC_CHARSET );
    }
Elseif (function_exists ('iconv _ substr '))
    {
$ Newstr = iconv_substr ($ str, 0, $ length, EC_CHARSET );
    }
Else
    {
// $ Newstr = trim_right (substr ($ str, 0, $ length ));
$ Newstr = substr ($ str, 0, $ length );
    }
 
If ($ append & $ str! = $ Newstr)
    {
$ Newstr. = '...';
    }
 
Return $ newstr;
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.