Solving the problem of dividing GBK Chinese garbled

Source: Internet
Author: User
Tags php error
Solve the problem of GBK Chinese characters garbled in segmentation

Recently encountered a magical word "Mingtao (TAO)".

The specific process is this:

1 $list Explode (' | ', ' ABC Mingtao |BC '); 2 Var_dump ($list);

Get the result of this partition.

Unlike imagination, the result is this:

Array (3) {  [0]=>  string(4) "ABC?  [1]=>  ""  [2]=>  "BC"}

There was garbled, and inexplicably appeared an empty element.

The reason, originally the word "Mingtao" GBK encoding is 8f7c, and | The ASCII is 7c, so explode will mingtao the second ASCII as | cut.

Since it is a double-byte problem, we solved it with mbstring.

Unfortunately, PHP did not mb_explode this function, looked for, found a mb_split.

Array string $pattern string $string $limit =-1])

There is no place to declare the code. In a closer look, he was encoded by mb_regex_encoding.

Then write the following code:

1 mb_regex_encoding (' GBK '); 2 $list = mb_split (' \| ', ' abc Mingtao |BC '); 3 Var_dump ($list);

Results PHP error, mb_regex_encoding do not know GBK, embarrassed.

Then use it to recognize:

1 mb_regex_encoding (' gb2312 '); 2 $list = mb_split (' \| ', ' abc Mingtao |BC '); 3 Var_dump ($list);

Results:

Array (3) {  [0]=>  string(4) "ABC?  [1]=>  ""  [2]=>  "BC"}

Found that this method is of little use. 、

As for the reason? The word "Mingtao" is not actually in GB2312 's code SET!!!!! But the code set with this word (GBK, GB18030) is not supported by this function!!!!!

Since this is not a good use, perhaps the universal regular expression is OK. Then get the following code:

1 Var_dump (preg_match_all$matches)); 2 Var_dump ($matches);

Results:

Int (2)array(2) {  [0]=>  array(2) {    [0]=>     String(4) "ABC?    " [1]=>    "BC"  }  [1]=>  Array (2) {    [0]=>    "?     [1]=>    string(1) "C"  }}

Well, I think more.

Now look at how to describe the scene in a regular way.

For reference, bird elder brother Big God's blog: Segmentation GBK Chinese encountered garbled solution. Unfortunately, the regular ability to be relatively low, I still can't think of a suitable regular expression (if there are big gods who come up with this regular expression, hope can tell me).

No way, reasoning, had to use substr:

1 functionMb_explode ($delimiter,$string,$encoding=NULL){2     $list=Array();3     Is_null($encoding) &&$encoding=mb_internal_encoding ();4     $len= Mb_strlen ($delimiter,$encoding);5      while(false!== ($idx= Mb_strpos ($string,$delimiter, 0,$encoding))){6         $list[] = Mb_substr ($string, 0,$idx,$encoding);7         $string= Mb_substr ($string,$idx+$len,NULL,$encoding);8     }   9     $list[] =$string;Ten     return $list;  One}

Test code:

1 $a = ' abc Mingtao |BC '; 2 3 Var_dump $a, ' GBK '); 4 Var_dump $a, ' GBK '); 5 Var_dump $a, ' GBK ');

Results:

array  (2  0]=> string  (5) "ABC Mingtao"  [ 1]=> string  (2) "BC"  array  (3 0]=> string  (1)" A " [ 1]=> string  (3)" Mingtao | "  [ 2]=> string  (0) "}  array  (2 0]=> string  (3) "ABC"  [ 1]=> string  (3) "|BC"  

This will give you the right results.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.