Phpspider PHP crawler Framework

Source: Internet
Author: User

In fact, my own is not often write regular, and irregular HTML to write the regular itself is a very troublesome thing, if the page is slightly changed and updated to maintain regular expression, in fact, it is very egg pain

My first feeling is to look for the reptile library, but found that the PHP crawler mature open source projects are quite a lot of

At first I was ready to use phpquery, because he realized a function like jquery that could reduce the time I spent, but after all it was 6 years ago, the original project was on http://code.google.com/p/phpquery/, GitHub Although someone has already copied the past,

Years of disrepair, because not particularly good use, and do not now what things need composer installation, not submitted to https://packagist.org, but now a lot of new projects are based on PHP7, a bit outdated,

In a while found that now phpspider very useful, attention is not php-spider, and there are Chinese documents, but not particularly perfect, https://doc.phpspider.org/

Https://github.com/owner888/phpspider

Note: This framework can only be run at the command line, command line, command line, command line, the important thing to say three times ^_^

But I need to run under the Web, test_requests.php found that the CSS selector has been implemented as an alternative to handwritten regular expressions, very good, strong and not strong, etc. users themselves after the use of their own evaluation

Can run directly on the Web

Use phpspider\core\requests;
Use Phpspider\core\selector;

Introduced



$html= Requests::get (' http://www.ccmn.cn/'); $data= Selector::select ($html, "#40288092327140f601327141c0560001", "CSS"); $data 1= Selector::select ($data, "tr", "CSS"); Array_shift($data 1); $array=Array(); if(!Empty($data 1) &&Is_array($data 1)) { foreach($data 1 as $k= &$v) { $data 2= Selector::select ($v, "TD", "CSS"); foreach($data 2 as $kk= &$VV) { $VV=Str_replace(' & #13; ', ',$VV); $VV=Str_replace(Array("\ r \ n", "\ r", "\ n"), "",$VV); $VV=Trim($VV); } $data 2[' 3 '] = Selector::select ($data 2[' 3 '], "font", "CSS"); unset($data 2[' 6 ']); $array[] =$data 2; }

It's just a little bit of a complicated web page. Fixed position crawl

It's simple, right?

Https://doc.phpspider.org/selector.html

The official support for a more powerful CSS selector, basically common enough

It's almost like writing jquery.

And this is the CLI running,

Take care not to delete it.

#/\* do not delete this comment \*/#

#/\* do not delete this paragraph of comment \*/#

It's going to get an error because the egg hurts to match these.

        if (! Preg_match $content) || ! Preg_match $content )         {            $msg = "Unknown error ..."            ; Log:: Error ($msg);             Exit ;        }

A little obsessive-compulsive feeling.

The source has not time to read, it is really worth reading

Currently other functional tests have been written on the blog

Phpspider PHP crawler Framework

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.