Where does the PHP crawler get the AJAX request address?

Source: Internet
Author: User
The Phpcrawl framework used,
The 1th step is to set the start address;
The 2nd step is to set the type of content to download: text/html;
The 3rd step is to use regular expressions to set the URL rules to be extended;
The 4th step is to start crawling and crawl the content of URLs that conform to the 3rd step URL rule.
The 5th step is to use regular expressions or DOM parsing tools to parse what you need.

The problem is:
Some of the content is AJAX requests, the request address is written by JavaScript, has been stitched up well. So, what should this address do to allow this crawler to execute? Put in the 3rd step of the extension address is not, because it is their own splicing, the source code does not have this address, matching is not.

Reply content:

The Phpcrawl framework used,
The 1th step is to set the start address;
The 2nd step is to set the type of content to download: text/html;
The 3rd step is to use regular expressions to set the URL rules to be extended;
The 4th step is to start crawling and crawl the content of URLs that conform to the 3rd step URL rule.
The 5th step is to use regular expressions or DOM parsing tools to parse what you need.

The problem is:
Some of the content is AJAX requests, the request address is written by JavaScript, has been stitched up well. So, what should this address do to allow this crawler to execute? Put in the 3rd step of the extension address is not, because it is their own splicing, the source code does not have this address, matching is not.

Use the stitched address directly, and then see if the Ajax is a GET or POST request, set the parameters and make a curl request, and then parse the data.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.