How does PHP use Querylist to easily capture JS dynamic rendering pages?

Source: Internet
Author: User

This chapter introduces how PHP uses Querylist to easily capture JS dynamic rendering page? There is a certain reference value, the need for friends can refer to, I hope to help you.

Querylist uses jquery to do the collection and has a rich plugin. The following is a demonstration of querylist using the PHANTOMJS plugin to crawl JS dynamically created page content.

First, installation

To install using composer:

1. Installing Querylist

Composer require jaeger/querylist

Github:https://github.com/jae-jae/querylist

2. Installing the PHANTOMJS Plugin

Composer require JAEGER/QUERYLIST-PHANTOMJS

Github:https://github.com/jae-jae/querylist-phantomjs

Second, download Phantomjs binary file

PHANTOMJS Official website: http://phantomjs.org, download Phantomjs binaries for the corresponding platform.

Third, plug-in API

Querylist browser ($url, $debug = false, $commandOpt = []): Open the connection using the browser

Iv. Use of

To capture the "Today's headline" Mobile version, the "Today's headline" Mobile version is based on the react framework, and the content is rendered purely dynamically.

The following shows the Phantomjs plugin usage for querylist:

1. Installing plugins

Use Ql\querylist;use ql\ext\phantomjs; $ql = Querylist::getinstance ();//setup Phantomjs binary file path $ql->use ( Phantomjs::class, '/usr/local/bin/phantomjs ');//or Custom function Name$ql->use (phantomjs::class, '/usr/local/ Bin/phantomjs ', ' browser ');

2.example-1

Get the HTML for dynamic rendering:

$html = $ql->browser (' https://m.toutiao.com ')->gethtml ();p rint_r ($html);

Get all P label text content:

$data = $ql->browser (' https://m.toutiao.com ')->find (' P ')->texts ();p Rint_r ($data->all ());

Output:

Array (    [0] = + selfie mode on! National Day holiday I and the national flag with a shadow    [1] = You have started the journey they are still in their posts for your holiday escort    [2] and joy and tears, the professor finally returned to Earth!    //....)

Using the HTTP proxy:

More options to view documents: Http://phantomjs.org/api/command-line.html$ql->browser (' https://m.toutiao.com ', true,[    // Use HTTP proxy '--proxy ' + ' 192.168.1.42:8080 ',    '--proxy-type ' = ' http ']

3.example-2

To customize a complex request:

$data = $ql->browser (function (\jonnyw\phantomjs\http\requestinterface $r) {    $r->setmethod (' GET ');    $r->seturl (' https://m.toutiao.com ');    $r->settimeout (10000); Seconds    $r->setdelay (3);//3 seconds    return $r;}) ->find (' P ')->texts ();p Rint_r ($data->all ());

Turn on debug mode and load the cookie file locally:

$data = $ql->browser (function (\jonnyw\phantomjs\http\requestinterface $r) {    $r->setmethod (' GET ');    $r->seturl (' https://m.toutiao.com ');    $r->settimeout (10000); Seconds    $r->setdelay (3);//3 seconds    return $r;},true,[    '--cookies-file ' = '/path/to/ Cookies.txt '])->rules (['    title ' = = [' P ', ' text '],    ' link ' = [' a ', ' href ']])->query () GetData ();p Rint_r ($data->all ());
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: