As needed, I want to write a simple PHP collection program. in the following example, I found a bunch of tutorials on the Internet and illustrated them. However, I found that all the tutorials on the Internet are plausible and none of them are actually usable. After a few days of hard work, I finally figured out the truth. Write it here, please correct it. The idea of the collection program is very simple, it is nothing more than a page, a "> <LINKhref =" http://www.php100.com//s
As needed, I want to write a simple PHP collection program. in the following example, I found a bunch of tutorials on the Internet and illustrated them. However, I found that all the tutorials on the Internet are plausible and none of them are actually usable. After a few days of hard work, I finally figured out the truth. Write it here, please correct it.
The idea of the collection program is very simple. it is simply to create a page first, usually a list page, get the address of all the links in it, and then open the link one by one to find what we are interested in. if you find it, it is stored in the database or processed in another way. The following is a simple example.
First, determine a collection page, which is generally the list area. Here the goal is: http://www.php100.com/article/11/index.htm. This is a list page. our goal is to collect all the articles on this list page. There is a list page. The first step is to open it and include its content in our program. Fopen or file_get_contents are generally used. here we use fopen as an example. How can I open it? Simple: $ source = fopen ("[url = http://www.php100.com/article/11/index.htm",] http://www.php100.com/article/11/index.htm ", 'r' [/url]); the content is actually included in our program. Note that $ source is a resource, not a processable text, so you can use the fread function to read the content into a variable. this is the real editable text. Example:
$ Content = fread ($ source, 99999); the following number indicates the number of bytes. just fill in a large value. You use file_put_contents to write $ content to a text file. you can see that the content is actually the source code of the webpage. Get the source code of the web page, we will analyze the link address inside the article, here to use the regular expression, [recommended regular expression tutorial (http://www.php100.com/article/7/all/545.1.htm)]. By viewing the source code, we can see that the link address inside the article is all like this http://www.php100.com/article/10/all/273.1.htm "> The database connection code is encapsulated in the function, the need to read the call ..
We can write a regular expression. $ Count = preg_match_all ("/(. + ?) <\/A>/", $ content, $ art_list );
The array $ art_list [1] [$ s] contains the link address of an article. $ Art_list [2] [$ s] contains the title of an article. This step is half the success.
Then, use the for loop to create each link in sequence and obtain the content in the same way as the title is obtained. These are similar to the tutorials I have found on the Internet, but the tutorials on the for loop network are poor. I have not found an article that can be used to clarify this issue, at the beginning, I used js to help with the loop, or I used an instance. at the beginning, I did this:
For ($ I = 0; $ I <20; 4i ++ {
The middle part is the part of the collected content.
I have collected one page. I must have collected another page.
However, the link cannot be opened with fopen. If the request fails or something, you cannot use js. you can only use this echo" "; Aa. php is the file name of our program, and the number following the id can help us achieve a loop and collect multiple pages. This is the key to truly loop.
}
My mind is a little uncomfortable and I am writing a little messy. I will take a look at it. it may not be a big deal to the experts, but it is very helpful for me to wait for Cainiao.