Example code of php Solution to DOM garbled code and dom sample code
Preface
DOM is a new xml and html processing class in php. It can operate the DOM tree as conveniently as javascript. More information about XML processing is provided on the Internet, today, this article introduces php's solution to DOM Garbled text. Let's not talk about it. Let's look at the solution below.
The solution is as follows:
/*** Request url page information * @ param str $ url * @ return str mixed | boolean */function curl_get ($ url) {$ curl = curl_init (); curl_setopt ($ curl, CURLOPT_URL, $ url); curl_setopt ($ curl, scheme, 1); // 302 jump curl_setopt ($ curl, CURLOPT_FOLLOWLOCATION, 1); curl_setopt ($ curl, CURLOPT_USERAGENT, 'mozilla/5.0 (Windows NT 6.1; WOW64; rv: 47.0) Gecko/20100101 Firefox/661'); curl_setopt ($ curl, CURLOPT_REFERER, $ Url); $ data = curl_exec ($ curl); $ code = curl_getinfo ($ curl, CURLINFO_HTTP_CODE); // output Request status code curl_close ($ curl ); if (200 = $ code) {// solves garbled characters if (preg_match ('# <meta [^>] * charset = "? Gb2312 "[^>] *> # ', $ data) {$ data = iconv (" gb2312 "," UTF-8 // IGNORE ", $ data ); $ data = preg_replace ('# <meta [^>] * charset = "? Gb2312 "[^>] * >#'is ',' <meta http-equiv =" Content-Type "content =" text/html; charset = UTF-8 "> ', $ data) ;}if (! Preg_match ('# <meta charset = "UTF-8" [^>] * >#'is', $ data) {$ data = str_replace ('
/*** Get the DOMDocument object * @ param str $ url * @ return boolean | DOM */function getDom ($ url) {$ html_content = curl_get ($ url ); if (empty ($ html_content) {// saveLog ($ url, 'request failed'); return false ;}$ dom = new DOMDocument ('1. 0 ', 'utf-8'); libxml_use_internal_errors (true); $ dom-> loadHTML ($ html_content); return $ dom ;}
$html_content = mb_convert_encoding($html_content, 'UTF-8', 'gb2312');
Summary
The above is all about this article. I hope this article will help you in your study or work. If you have any questions, please leave a message.