PHP uses the Mb_string library to handle Windows-related Chinese characters, _php tutorials

Source: Internet
Author: User
Tags translit

PHP uses the Mb_string function library to handle Windows-related Chinese characters,


Yesterday, I want to batch process the previous download of a bunch of files, the key content in the file with regular match out, centralized processing. An issue with working with files is the encoding problem in the Windows operating system.

We all know that in Windows (of course, the Chinese version), the file name and the file content encoding is GBK, and we in the development process, the IDE encoding is UTF-8, (here does not discuss why and so on,

Just consider how to turn the code into the same) so the Chinese in the UTF-8 encoded regular pattern string that I wrote did not match correctly in the GBK encoded file.

At first, I have no way, tried to php script file encoding also changed to GBK, but also can use, but think of this method is too low, so find out if there is no function in PHP to meet my needs.

At this point, I thought about the function iconv ()that I used to work with the file name in Windows, and its function prototype is as follows:

String Iconv (String $in _charset, String $out _charset, String $str)

We often use:

$out _charset= ' utf-8 '; $fileName=iconv($fileName,$out _charset, ' GBK ');

To process the file name, change the file name from GBK to UTF-8 and the content unchanged.

Manual translation Additional:

    • If you add//translit that is $out_charset= ' utf-8//translit ' after the output string $out_charset, the program will automatically replace the UTF-8 character with a similar character when it encounters a character that cannot be converted to UTF-8;
    • If you add//ignore that is $out_charset= ' utf-8//ignore ' after the output string $out_charset, the program automatically skips the character when it encounters a character that cannot be converted to UTF-8.
    • If you don't add anything, the substitution is interrupted when you encounter a character that cannot be replaced with UTF-8.

However, when I work with this function, the result is this:

It means that the maximum number of characters that the Iconv () function can handle is only 64, the average file name size, and my file content is obviously more than 64 characters.

There was no way, I had to look for other functions again.

Until I found the Mb_string function library, which is generally integrated in the PHP environment, we can find it in phpinfo ().

The Mb_string function has a mb_convert_encoding () function that changes the encoding of a string, and its function is prototyped as follows:

String mb_convert_encoding (String $str, String $to _encoding [, Mixed $from _encoding])

The base prototype is similar to the Iconv () function, except that it does not have a suffix modification to the output function, nor does it explicitly limit the length of the string.

And we see that $from_encoding is optional and it can automatically identify the source code.

Because I can't find an exact character that can't be transcoded, I don't know how to deal with the word inode that can't be transcoded.

Through the mb_convert_encoding () function, the whole file is processed, and the problem is solved smoothly.

Finally introduce the mb_string function Library, it is all called multibyte string, its many methods are extended from PHP itself, the string library, function name in front of the original function added "Mb_", these functions in addition to the function of the original functions, A $encoding optional parameter is added at the end of the optional parameter, which specifies how the function will handle the string in the encoding.

For example, the Strpos () function finds the position of a string in another string.

Strpos ("Welcome", "ask", 0) returns 12 because the script is UTF-8 encoded, and the string is converted to UTF-8 encoding, each Chinese character takes 3 bytes.

In the Mb_strpos () function, Mb_strpos ("Welcome", "Q", 0, ' Utf-8 ') will return 4, which will execute the string as if it had been transferred to the UTF-8 state.

and Mb_strpos ("Welcome to visit", "Ask", 0, ' GBK ') will return 6

Of course, it has more features

If you think this blog is helpful to you, you can recommend or follow me, if you have any questions, you can leave a comment under the discussion, thank you.

http://www.bkjia.com/PHPjc/1069670.html www.bkjia.com true http://www.bkjia.com/PHPjc/1069670.html techarticle PHP uses the Mb_string function library to deal with Windows-related Chinese characters, yesterday want to batch process the previous download of a bunch of files, the key content in the file with regular match out, centralized processing. In the ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.