Principles of PHP collection programs

Last Update:2013-04-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As needed, I want to write a simple PHP collection program. in the following example, I found a bunch of tutorials on the Internet and illustrated them. However, I found that all the tutorials on the Internet are plausible and none of them are actually usable. After a few days of hard work, I finally figured out the truth. Write it here, please correct it. The idea of the collection program is very simple, it is nothing more than a page, a "> <LINKhref =" http://www.php100.com//s

The idea of the collection program is very simple. it is simply to create a page first, usually a list page, get the address of all the links in it, and then open the link one by one to find what we are interested in. if you find it, it is stored in the database or processed in another way. The following is a simple example.

First, determine a collection page, which is generally the list area. Here the goal is: http://www.php100.com/article/11/index.htm. This is a list page. our goal is to collect all the articles on this list page. There is a list page. The first step is to open it and include its content in our program. Fopen or file_get_contents are generally used. here we use fopen as an example. How can I open it? Simple: $ source = fopen ("[url = http://www.php100.com/article/11/index.htm",] http://www.php100.com/article/11/index.htm ", 'r' [/url]); the content is actually included in our program. Note that $ source is a resource, not a processable text, so you can use the fread function to read the content into a variable. this is the real editable text. Example:

$ Content = fread ($ source, 99999); the following number indicates the number of bytes. just fill in a large value. You use file_put_contents to write $ content to a text file. you can see that the content is actually the source code of the webpage. Get the source code of the web page, we will analyze the link address inside the article, here to use the regular expression, [recommended regular expression tutorial (http://www.php100.com/article/7/all/545.1.htm)]. By viewing the source code, we can see that the link address inside the article is all like this http://www.php100.com/article/10/all/273.1.htm "> The database connection code is encapsulated in the function, the need to read the call ..

We can write a regular expression. $ Count = preg_match_all ("/(. + ?) <\/A>/", $ content, $ art_list );

The array $ art_list [1] [$ s] contains the link address of an article. $ Art_list [2] [$ s] contains the title of an article. This step is half the success.

Then, use the for loop to create each link in sequence and obtain the content in the same way as the title is obtained. These are similar to the tutorials I have found on the Internet, but the tutorials on the for loop network are poor. I have not found an article that can be used to clarify this issue, at the beginning, I used js to help with the loop, or I used an instance. at the beginning, I did this:
For ($ I = 0; $ I <20; 4i ++ {

The middle part is the part of the collected content.

I have collected one page. I must have collected another page.
However, the link cannot be opened with fopen. If the request fails or something, you cannot use js. you can only use this echo" "; Aa. php is the file name of our program, and the number following the id can help us achieve a loop and collect multiple pages. This is the key to truly loop.
}
My mind is a little uncomfortable and I am writing a little messy. I will take a look at it. it may not be a big deal to the experts, but it is very helpful for me to wait for Cainiao.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Principles of PHP collection programs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Principles of PHP collection programs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support