The code is as follows:
Individual characters are garbled! Strange, please look at the picture
What is the cause of this?
The above code seems to be wrong ah, the original page is clearly GB2312 lack of judgment out is CP936, no words AH
Please help to see if the above code needs to be perfected.
Thank you so much!
Reply to discussion (solution)
Also have individual page unexpectedly curl_init back is blank data, want to refresh several times to show, parameter value set problem?
The data returned are:
According to him, he knows the page code.
It's not necessary to have a programming judgment.
Mb_detect_encoding judgment often misses, so added the Mb_check_encoding function
Data fragments
There's no reason for illegal characters.
CP936 is the international appellation of GBK
The first problem, not garbled, that is the picture, Curl crawl Baidu page, will deliberately convert some text into pictures, anti-crawling. You look at the page elements, you will find that those garbled is actually Baidu's image address.
The second problem, you set the timeout time to a larger point, just fine, it may be your network problem.
The first problem, not garbled, that is the picture, Curl crawl Baidu page, will deliberately convert some text into pictures, anti-crawling. You look at the page elements, you will find that those garbled is actually Baidu's image address.
The second problem, you set the timeout time to a larger point, just fine, it may be your network problem.