Php web crawler

Source: Internet
Author: User
Php web crawler PHP web crawler database industry data

Have you ever developed a similar program? Can give some advice. The functional requirement is to automatically obtain relevant data from the website and store the data in the database.


Reply to discussion (solution)

Curl crawls the target website, obtains the corresponding data through regular expressions or DOM, and then stores the database or file.
There is nothing that is too difficult. You need to consider the following:
Crawling policy (only crawling specific domain names, depth first or breadth first ).

Crawling efficiency (multi-thread crawling and how to allocate crawling tasks)

........

Thank you. could you recommend some references? I am a newbie. I am very grateful for your support.

Thank you. could you recommend some references? I am a newbie. I am very grateful for your support.
There are many open-source crawlers, such as phpdig. if there are no restrictions on languages, there are also many java versions of nutch (the predecessor of hadoop ).. For simple data extraction, there is a very simple client class, snoopy

A lot of open source !!!!

For php source code, refer


Thank you. could you recommend some references? I am a newbie. I am very grateful for your support.
There are many open-source crawlers, such as phpdig. if there are no restrictions on languages, there are also many java versions of nutch (the predecessor of hadoop ).. For simple data extraction, there is a very simple client class, snoopy
Thank you. what I need is to automatically capture the required data from the website and store the data in the database.



Thank you. could you recommend some references? I am a newbie. I am very grateful for your support.
There are many open-source crawlers, such as phpdig. if there are no restrictions on languages, there are also many java versions of nutch (the predecessor of hadoop ).. For simple data extraction, there is a very simple client class, snoopy
Thank you. what I need is to automatically capture the required data from the website and store the data in the database.
If you just capture some website data, you don't have to worry too much about efficiency. Directly capture the target webpage through curl (if allowed, the simplest file_get_contents can also be used), and then obtain data using a regular expression or DOM.

It is a bit difficult to write by yourself. Are there any open-source products? I recommend it for you. Thank you.

For example, I want to automatically capture the price of a car named "BMW" on the network (without a fixed url) and store it in a database. then I am reading it. Can I write a simple code example? Thank you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.