: This article mainly introduces a php self-made crawler based on simple_html_dom v1.0. For more information about PHP tutorials, see. For a long time, the enthusiasm for making Web parsing and crawlers has not been diminished. Today, we use the open-source simple_html_dom.php parsing framework to make a crawler:
find('a') as $e) {$f=$e->href;//if($f[10]==':')continue;if($f[0]=='/')$f='http://www.baidu.com'.$f;//Completion the urlif($f[4]=='s')continue;//If the url is "https://" continue (the simple_html_dom might can't prase the https:// url) if(stripos($f,"baidu")==FALSE)continue;//If the url not in this website continue echo $f . '
';$tmp[$cun++]=$f; //Save the urls into array}foreach($tmp as $r) //Dig the urls in $tmp[]{$html2=file_get_html($r); //Redo the stepforeach($html2->find('a') as $a){$u=$a->href;if($u[0]=='/')$u='http://www.baidu.com'.$u;if($u[4]=='s')continue;if(stripos($u,"baidu")==FALSE)continue;echo $u.'
';}$html2=null;}?>
// There will always be a Fatal error: Call to a member function find () on a non-object in D: \ xampp \ htdocs \ html \ index. php on line 21 warning and students have corrected many minor mistakes, but this still does not solve the problem.
--------------------- Split line ---------------------
Simple_html_dom download:
Https://github.com/Ph0enixxx/simple_html_dom
= Git4win is unavailable on home computers
The above introduces a php self-made crawler based on simple_html_dom v1.0, including some content, and hopes to help friends who are interested in PHP tutorials.