Principles of PHP collection programs

Source: Internet
Author: User
As needed, I want to write a simple PHP collection program. In the following example, I found a bunch of tutorials on the Internet and illustrated them. However, I found that all the tutorials on the Internet are plausible and none of them are actually usable.

As needed, I want to write a simple PHP collection program. In the following example, I found a bunch of tutorials on the Internet and illustrated them. However, I found that all the tutorials on the Internet are plausible and none of them are actually usable.

After a few days of hard work, I finally figured out the truth. Write it here, please correct it.
The idea of the collection program is very simple. It is simply to create a page first, usually a list page, get the address of all the links in it, and then open the link one by one to find what we are interested in. If you find it, it is stored in the database or processed in another way. The following is a simple example.

First, determine a collection page, which is generally the list area. Here the goal is: http://www.jb51.net/article/11/index.htm. This is a list page. Our goal is to collect all the articles on this list page.

There is a list page. The first step is to open it and include its content in our program. Fopen or file_get_contents are generally used. Here we use fopen as an example. How can I open it? Simple: $ source = fopen ("http://www.jb51.net/article/11/index.htm", 'R'); The content has actually been included in our program. Note that $ source is a resource, not a processable text, so you can use the fread function to read the content into a variable. This is the real editable text. Example:
$ Content = fread ($ source, 99999); the following number indicates the number of bytes. Just fill in a large value. You use file_put_contents to write $ content to a text file. You can see that the content is actually the source code of the webpage. Get the source code of the web page, we will analyze the link address inside the article, here to use the regular expression, [recommended regular expression tutorial (http://www.jb51.net/article/7/all/545.1.htm)]. By viewing the source code, we can see that the link addresses in the article are all like this.

Encapsulate the database connection code in the function and call it when you need to read it ..
We can write a regular expression. $ Count = preg_match_all ("/

(. + ?) <\/A>/", $ content, $ art_list );
The array $ art_list [1] [$ s] contains the link address of an article. $ Art_list [2] [$ s] contains the title of an article. This step is half the success.
Then, use the for loop to create each link in sequence and obtain the content in the same way as the title is obtained. These are similar to the tutorials I have found on the Internet, but the tutorials on the for Loop Network are poor. I have not found an article that can be used to clarify this issue, at the beginning, I used js to help with the loop, Or I used an instance. At the beginning, I did this:
For ($ I = 0; $ I <20; 4i ++ {
The middle part is the part of the collected content.
I have collected one page. I must have collected another page.
However, the link cannot be opened with fopen. If the request fails or something, you cannot use js. You can only use this echo" "; Aa. php is the file name of our program, and the number following the id can help us achieve a loop and collect multiple pages. This is the key to truly loop.
}
My mind is a little uncomfortable and I am writing a little messy. I will take a look at it. It may not be a big deal to the experts, but it is very helpful for me to wait for cainiao.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.