This chapter introduces how PHP uses Querylist to easily capture JS dynamic rendering page? There is a certain reference value, the need for friends can refer to, I hope to help you.
Querylist uses jquery to do the collection and has a rich plugin. The following is a demonstration of querylist using the PHANTOMJS plugin to crawl JS dynamically created page content.
First, installation
To install using composer:
1. Installing Querylist
Composer require jaeger/querylist
Github:https://github.com/jae-jae/querylist
2. Installing the PHANTOMJS Plugin
Composer require JAEGER/QUERYLIST-PHANTOMJS
Github:https://github.com/jae-jae/querylist-phantomjs
Second, download Phantomjs binary file
PHANTOMJS Official website: http://phantomjs.org, download Phantomjs binaries for the corresponding platform.
Third, plug-in API
Querylist browser ($url, $debug = false, $commandOpt = []): Open the connection using the browser
Iv. Use of
To capture the "Today's headline" Mobile version, the "Today's headline" Mobile version is based on the react framework, and the content is rendered purely dynamically.
The following shows the Phantomjs plugin usage for querylist:
1. Installing plugins
Use Ql\querylist;use ql\ext\phantomjs; $ql = Querylist::getinstance ();//setup Phantomjs binary file path $ql->use ( Phantomjs::class, '/usr/local/bin/phantomjs ');//or Custom function Name$ql->use (phantomjs::class, '/usr/local/ Bin/phantomjs ', ' browser ');
2.example-1
Get the HTML for dynamic rendering:
$html = $ql->browser (' https://m.toutiao.com ')->gethtml ();p rint_r ($html);
Get all P label text content:
$data = $ql->browser (' https://m.toutiao.com ')->find (' P ')->texts ();p Rint_r ($data->all ());
Output:
Array ( [0] = + selfie mode on! National Day holiday I and the national flag with a shadow [1] = You have started the journey they are still in their posts for your holiday escort [2] and joy and tears, the professor finally returned to Earth! //....)
Using the HTTP proxy:
More options to view documents: Http://phantomjs.org/api/command-line.html$ql->browser (' https://m.toutiao.com ', true,[ // Use HTTP proxy '--proxy ' + ' 192.168.1.42:8080 ', '--proxy-type ' = ' http ']
3.example-2
To customize a complex request:
$data = $ql->browser (function (\jonnyw\phantomjs\http\requestinterface $r) { $r->setmethod (' GET '); $r->seturl (' https://m.toutiao.com '); $r->settimeout (10000); Seconds $r->setdelay (3);//3 seconds return $r;}) ->find (' P ')->texts ();p Rint_r ($data->all ());
Turn on debug mode and load the cookie file locally:
$data = $ql->browser (function (\jonnyw\phantomjs\http\requestinterface $r) { $r->setmethod (' GET '); $r->seturl (' https://m.toutiao.com '); $r->settimeout (10000); Seconds $r->setdelay (3);//3 seconds return $r;},true,[ '--cookies-file ' = '/path/to/ Cookies.txt '])->rules ([' title ' = = [' P ', ' text '], ' link ' = [' a ', ' href ']])->query () GetData ();p Rint_r ($data->all ());