PHP implementation Recursive crawl Web page class instance
Specific as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21st 22 23 24 25 26 27 28 29 30 |
Class crawler{ Private $_depth=5; Private $_urls=array (); function Extract_links ($url) { if (! $this->_started) { $this->_started=1; $curr _depth=0; }else{ $curr _depth++; } if ($curr _depth< $this->_depth) { $data =file_get_contents ($url); if (Preg_match_all ('/(?: HTTP|HTTPS)://(?: www.) * (?: [A-za-z0-9_-]{1,15}.+[a-za-z0-9_]{1,}) {1,} (?: [a-za-z0-9_/.-?&:%,!;] *))/', $data, $urls 12)) { foreach ($urls 12[0] as $k = = $v) { $check =get_headers ($v, 1); if (Strstr ($v, $url) && $check [0]== ' http/1.1-OK ' &&!array_search ($v, $this->_urls) && $ curr_depth< $this->_depth) { $this->_urls[]= $v; $this->extract_links ($v); } } } } return $this->_urls; } } ?> |
http://www.bkjia.com/PHPjc/978265.html www.bkjia.com true http://www.bkjia.com/PHPjc/978265.html techarticle The PHP implementation of the Recursive crawl Web page class instance is as follows: 1 2 3 4 5 6 7 8 9 Each of the ten pages, and a. PHP class crawl er{private $_depth=5, private ...