PHP parses the transcoding bug_PHP tutorial of the html class library simple_html_dom

Source: Internet
Author: User
PHP parses the transcoding bug of the html class library simple_html_dom. Some articles have been captured using simple_html_dom over the past few days. The codes of different websites are basically gbkgb2312utf-8 in China. Most of them are gb2312 and UTF-8. One of my simple_html_dom versions is using simple_html_dom to catch some articles over the past few days. The encoding of different websites is basically gbk gb2312 UTF-8 in China. Most of them are gb2312 and UTF-8.

The simple_html_dom method of my current version is convert_text.

The code is as follows:


// PaperG-Function to convert the text from one character set to another if the two sets are not the same.
Function convert_text ($ text)
{
Global $ debug_object;
If (is_object ($ debug_object) {$ debug_object-> debug_log_entry (1 );}
$ Converted_text = $ text;
$ SourceCharset = "";
$ TargetCharset = "";
If ($ this-> dom)
{
$ SourceCharset = strtoupper ($ this-> dom-> _ charset );
$ TargetCharset = strtoupper ($ this-> dom-> _ target_charset );
}
If (is_object ($ debug_object) {$ debug_object-> debug_log (3, "source charset:". $ sourceCharset. "target charaset:". $ targetCharset );}
If (! Empty ($ sourceCharset )&&! Empty ($ targetCharset) & (strcasecmp ($ sourceCharset, $ targetCharset )! = 0 ))
{
// Check if the reported encoding cocould have been incorrect and the text is actually already UTF-8
If (strcasecmp ($ targetCharset, 'utf-8') = 0) & ($ this-> is_utf8 ($ text )))
{
$ Converted_text = $ text;
}
Else
{
$ Converted_text = iconv ($ sourceCharset, $ targetCharset, $ text );
}
}
// Lets make sure that we don't have that silly BOM issue with any of the UTF-8 text we output.
If ($ targetCharset = 'utf-8 ')
{
If (substr ($ converted_text, 0, 3) = "\ xef \ xbb \ xbf ")
{
$ Converted_text = substr ($ converted_text, 3 );
}
If (substr ($ converted_text,-3) = "\ xef \ xbb \ xbf ")
{
$ Converted_text = substr ($ converted_text, 0,-3 );
}
}
Return $ converted_text;
}

Let's look at this line:

The code is as follows:


$ Converted_text = iconv ($ sourceCharset, $ targetCharset, $ text );

Transcoding is incorrect. For example, the gb2312 text is converted:

The code is as follows:


The 24-year-old Han Zhuangzhuang not only scored a zero penalty score in the April 26 Longines International Federation of Marathon World Cup Chinese league qualifying tournament held at the Maraton Park on April 9, 2014... the first time Zhao Zhiwen, the first Olympic contestant, received a zero penalty score, it took 77 seconds to 07...

The facts prove that the transcoding function is not properly handled. Because I only want to use simple_html_dom to build the dom. I am not planning to take the time to handle this bug. But simply put

The code is as follows:


$ Converted_text = iconv ($ sourceCharset, $ targetCharset, $ text );

Change

The code is as follows:


$ Converted_text = $ text;

That's all. The idea is to cancel transcoding. Okay, you don't have to worry about your work. you can continue.

Bytes. The encoding of different websites is basically gbk gb2312 UTF-8 in China. Most of them are gb2312 and UTF-8. My simple_html_dom version has a side...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.