Php detection character encoding code

Source: Internet
Author: User
Tags preg

Function utf8_gb2312 ($ str, $ default = 'gb2312 ')
{
$ Str = preg_replace ("/[x01-x7f] +/", "", $ str );
If (empty ($ str) return $ default;

$ Preg = array (
"Gb2312" => "/^ ([xa1-xf7] [xa0-xfe]) + $/", // regular determine whether it is gb2312
"UTF-8" => "/^ [x {4e00}-x {9fa5}] + $/u", // check whether the regular expression is a Chinese character (utf8 encoding condition ), this range actually contains traditional Chinese text.
);

If ($ default = 'gb2312 '){
$ Option = 'utf-8 ';
} Else {
$ Option = 'gb2312 ';
}

If (! Preg_match ($ preg [$ default], $ str )){
Return $ option;
}
$ Str = @ iconv ($ default, $ option, $ str );

// The value cannot be converted to $ option, indicating that the original value is not $ default.
If (empty ($ str )){
Return $ option;
}

The default encoding is gb2312, and I have made statistics. In 90% cases, it is gb2312. Therefore, my detection function cannot appear originally gb2312, and the result is utf8. the basic idea is:

1. Remove all ascii values. If all values are ascii values, they are gb2312.

2. Assume that the string is gb2312 and use a regular expression to check whether it is actually gb2312. If not, it is UTF-8.

3. Then, use iconv to convert the string to utf8. If the conversion fails, it may not be a real gb2312 encoded character.

(I have tried to use regular expression matching as accurately as possible, but the gb2312 encoding is not continuous and there will still be holes), then the final encoding is UTF-8.

4. Otherwise, it is gb2312 encoding.

After such a check function is added, there is a garbled text in the 1000 keywords, which is much less garbled than the previous 100 keywords.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.