Solve the Problem of garbled characters in gbk Chinese, and split gbk Chinese garbled characters

Source: Internet
Author: User

Solve the Problem of garbled characters in gbk Chinese, and split gbk Chinese garbled characters

Recently, I encountered a magic word "tao )".

The specific process is as follows:

1 $ list = explode ('|', 'abc scheme | bc'); 2 var_dump ($ list );

Obtain the result of this split.

Unlike imagination, the result is as follows:

array(3) {  [0]=>  string(4) "abc?  [1]=>  string(0) ""  [2]=>  string(2) "bc"}

Garbled characters appear, and an empty element appears inexplicably.

The reason is that the gbk encoding of the word "bytes" is 8f7c, And the | ASCII is 7c. In this way, explode uses the second ASCII of bytes as | cut.

Since it is a dual-byte problem, we can solve it with mbstring.

Unfortunately, php does not have a function such as mb_explode. Find a function such as mb_split.

array mb_split ( string $pattern , string $string [, int $limit = -1 ] )

The encoding is not declared. The code is declared through mb_regex_encoding.

Write the following code:

1 mb_regex_encoding ('gbk'); 2 $ list = mb_split ('\ |', 'abc scheme | bc'); 3 var_dump ($ list );

The result is an error in php. mb_regex_encoding does not know gbk and encoding.

You can use it to understand:

1 mb_regex_encoding ('gb2312'); 2 $ list = mb_split ('\ |', 'abc scheme | bc'); 3 var_dump ($ list );

Result:

array(3) {  [0]=>  string(4) "abc?  [1]=>  string(0) ""  [2]=>  string(2) "bc"}

It is found that this method is useless. ,

Why? The word "bytes" is not in the GB2312 album !!!!! However, this function does not support the limit set (GBK, GB18030 !!!!!

Since this is not easy to use, the omnipotent regular expression may be OK. The following code is obtained:

1 var_dump (preg_match_all ('/([^ \ |]) */', 'abc tables | bc', $ matches); 2 var_dump ($ matches );

Result:

int(2)array(2) {  [0]=>  array(2) {    [0]=>    string(4) "abc?    [1]=>    string(2) "bc"  }  [1]=>  array(2) {    [0]=>    string(1) "?    [1]=>    string(1) "c"  }}

Okay, I think more.

Now let's look at how to use regular expressions to describe this scenario.

For more information, refer to the blog of laruence: how to split GBK Chinese into garbled characters. Unfortunately, I still cannot find a suitable regular expression if the regular expression capability is low. (If you want to come up with this regular expression, please let me know ).

There's no way. I thought about it, so I had to use substr:

 1 function mb_explode($delimiter, $string, $encoding = null){ 2     $list = array(); 3     is_null($encoding) && $encoding = mb_internal_encoding(); 4     $len = mb_strlen($delimiter, $encoding); 5     while(false !== ($idx = mb_strpos($string, $delimiter, 0, $encoding))){ 6         $list[] = mb_substr($string, 0, $idx, $encoding); 7         $string = mb_substr($string, $idx + $len, null, $encoding); 8     }    9     $list[] = $string;10     return $list; 11 } 

Test code:

1 $ a = 'abc scheme | bc'; 2 3 var_dump (mb_explode ('|', $ a, 'gbk'); 4 var_dump (mb_explode ('bc ', $ a, 'gbk'); 5 var_dump (mb_explode ('hour', $ a, 'gbk '));

Result:

Array (2) {[0] => string (5) "abc regular" [1] => string (2) "bc"} array (3) {[0] => string (1) "a" [1] => string (3) "inline |" [2] => string (0) ""} array (2) {[0] => string (3) "abc" [1] => string (3) "| bc "}

In this way, you can get the correct result.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.