PHP uses the Mb_string function library to handle Windows-related Chinese characters
Yesterday, I want to batch process the previous download of a bunch of files, the key content in the file with regular match out, centralized processing. An issue with working with files is the encoding problem in the Windows operating system.
We all know that in Windows (of course, the Chinese version), the file name and the file content encoding is GBK, and we in the development process, the IDE encoding is UTF-8, (here does not discuss why and so on,
Just consider how to turn the code into the same) so the Chinese in the UTF-8 encoded regular pattern string that I wrote did not match correctly in the GBK encoded file.
At first, I have no way, tried to php script file encoding also changed to GBK, but also can use, but think of this method is too low, so find out if there is no function in PHP to meet my needs.
At this point, I thought about the function iconv () that I used to work with the file name in Windows, and its function prototype is as follows:
We often use:
$out _charset= ' utf-8 '; $fileName =iconv ($fileName, $out _charset, ' GBK ');
To process the file name, change the file name from GBK to UTF-8 and the content unchanged.
Manual translation Additional:
If you add//translit that is $out_charset= ' utf-8//translit ' after the output string $out_charset, the program will automatically replace the UTF-8 character with a similar character when it encounters a character that cannot be converted to UTF-8;
If you add//ignore that is $out_charset= ' utf-8//ignore ' after the output string $out_charset, the program automatically skips the character when it encounters a character that cannot be converted to UTF-8.
If you don't add anything, the substitution is interrupted when you encounter a character that cannot be replaced with UTF-8.
However, when I work with this function, the result is this:
It means that the maximum number of characters that the Iconv () function can handle is only 64, the average file name size, and my file content is obviously more than 64 characters.
There was no way, I had to look for other functions again.
Until I found the Mb_string function library, which is generally integrated in the PHP environment, we can find it in phpinfo ().
The Mb_string function has a mb_convert_encoding () function that changes the encoding of a string, and its function is prototyped as follows:
The base prototype is similar to the Iconv () function, except that it does not have a suffix modification to the output function, nor does it explicitly limit the length of the string.
And we see that $from_encoding is optional and it can automatically identify the source code.
Because I can't find an exact character that can't be transcoded, I don't know how to deal with the word inode that can't be transcoded.
Through the mb_convert_encoding () function, the whole file is processed, and the problem is solved smoothly.
Finally introduce the Mb_string function library, it is all called multibyte string, its many methods are extended from PHP itself, the string library, function name in front of the original function added "Mb_", these functions in addition to the function of the original functions, A $encoding optional parameter is added at the end of the optional parameter, which specifies what encoding the function will use to process the string.
For example, the Strpos () function finds the position of a string in another string.
Strpos ("Welcome", "ask", 0) returns 12 because the script is UTF-8 encoded, and the string is converted to UTF-8 encoding, each Chinese character takes 3 bytes.
In the Mb_strpos () function, Mb_strpos ("Welcome", "Q", 0, ' Utf-8 ') will return 4, which will execute the string as if it had been transferred to the UTF-8 state.
and Mb_strpos ("Welcome to visit", "Ask", 0, ' GBK ') will return 6
Of course, it has more features
below to introduce the Windows environment to open PHP Mb_string method
A few days ago run a PHP program, need to turn character encoding, but a probe server, actually said does not support mb_string extension. I checked the PHP extension library to have php_mbstring.dll this file.
Here's how to open the method to tell everyone
1. Make sure you have php_mbstring.dll this file under your windows/system32, and do not copy it into Windows/system32 from your PHP installation directory extensions.
2. Find php.ini in Windows directory Open edit, search Mbstring.dll, find
; Extension=php_mbstring.dll
then remove the previous; Open the support for the component
3. Restart the PHP service (if not you can restart the computer)
4. Complete