Web page parsing and crawler production enthusiasm has not been reduced today with the open-source simple_html_dom.php analytic framework made a crawler:
Find (' a ') as $e) {$f = $e->href;//if ($f [10]== ': ') continue;if ($f [0]== '/') $f = ' http://www.baidu.com '. $f;// Completion the Urlif ($f [4]== ' s ') continue;//if the URL is "https://" continue (the simple_html_dom might can ' t prase the HT tps://URL) if (Stripos ($f, "Baidu") ==false) continue;//if The URL not in this website continue echo $f. '
'; $tmp [$cun ++]= $f; Save the URLs into Array}foreach ($tmp as $r)//dig the URLs in $tmp []{$html 2=file_get_html ($r);//redo the Stepforeach ($ Html2->find (' a ') as $a) {$u = $a->href;if ($u [0]== '/') $u = ' http://www.baidu.com '. $u; if ($u [4]== ' s ') continue;if ( Stripos ($u, "Baidu") ==false) Continue;echo $u. '
';} $html 2=null;}? >
And then there's always a Fatal error: Call to a member function find () on a non-object in D:\xampp\htdocs\html\inde x.php on the line 21 warning after communicating with the seniors corrected a lot of minor mistakes but this is still not a solution. I hope the great God can enlighten me.
---------------------Split Line---------------------
Simple_html_dom Download:
Https://github.com/Ph0enixxx/simple_html_dom
= = home computer cannot use Git4win
The above describes the PHP self-made based on Simple_html_dom crawler a v1.0, including the aspects of the content, I hope that the PHP tutorial interested in a friend helpful.