In fact, my own is not often write regular, and irregular HTML to write the regular itself is a very troublesome thing, if the page is slightly changed and updated to maintain regular expression, in fact, it is very egg pain
My first feeling is to look for the reptile library, but found that the PHP crawler mature open source projects are quite a lot of
At first I was ready to use phpquery, because he realized a function like jquery that could reduce the time I spent, but after all it was 6 years ago, the original project was on http://code.google.com/p/phpquery/, GitHub Although someone has already copied the past,
Years of disrepair, because not particularly good use, and do not now what things need composer installation, not submitted to https://packagist.org, but now a lot of new projects are based on PHP7, a bit outdated,
In a while found that now phpspider very useful, attention is not php-spider, and there are Chinese documents, but not particularly perfect, https://doc.phpspider.org/
Https://github.com/owner888/phpspider
Note: This framework can only be run at the command line, command line, command line, command line, the important thing to say three times ^_^
But I need to run under the Web, test_requests.php found that the CSS selector has been implemented as an alternative to handwritten regular expressions, very good, strong and not strong, etc. users themselves after the use of their own evaluation
Can run directly on the Web
Use phpspider\core\requests;
Use Phpspider\core\selector;
Introduced
$html= Requests::get (' http://www.ccmn.cn/'); $data= Selector::select ($html, "#40288092327140f601327141c0560001", "CSS"); $data 1= Selector::select ($data, "tr", "CSS"); Array_shift($data 1); $array=Array(); if(!Empty($data 1) &&Is_array($data 1)) { foreach($data 1 as $k= &$v) { $data 2= Selector::select ($v, "TD", "CSS"); foreach($data 2 as $kk= &$VV) { $VV=Str_replace(' & #13; ', ',$VV); $VV=Str_replace(Array("\ r \ n", "\ r", "\ n"), "",$VV); $VV=Trim($VV); } $data 2[' 3 '] = Selector::select ($data 2[' 3 '], "font", "CSS"); unset($data 2[' 6 ']); $array[] =$data 2; }
It's just a little bit of a complicated web page. Fixed position crawl
It's simple, right?
Https://doc.phpspider.org/selector.html
The official support for a more powerful CSS selector, basically common enough
It's almost like writing jquery.
And this is the CLI running,
Take care not to delete it.
#/\* do not delete this comment \*/#
#/\* do not delete this paragraph of comment \*/#
It's going to get an error because the egg hurts to match these.
if (! Preg_match $content) || ! Preg_match $content ) { $msg = "Unknown error ..." ; Log:: Error ($msg); Exit ; }
A little obsessive-compulsive feeling.
The source has not time to read, it is really worth reading
Currently other functional tests have been written on the blog
Phpspider PHP crawler Framework