PHP Acquisition Program principle analysis Article _php skill

Source: Internet
Author: User
Tags fread

Instead focusing for a few days, finally figured out the truth inside. Write it here, please correct me.
The idea of the acquisition program is very simple, nothing more than to hit a page, is generally a list page, get all the links in the address, and then open a link to find something we are interested in, if found, put it into storage or other processing. Here's a very simple example to say.

First to determine a collection page, is generally the surface of the column. Here's the goal: Http://www.jb51.net/article/11/index.htm. This is a list page, and our goal is to collect all the articles on this list page.

With the list page, the first step is to open it and incorporate its contents into our program. Generally using fopen or file_get_contents functions, we use fopen as an example. How do you open it? Very simple: $source =fopen ("http://www.jb51.net/article/11/index.htm", "R"); Actually, it has been incorporated into our program. Note that the $source is a resource, not a processed text, so use the function fread to read the content into a variable, which is really editable text. Example:
$content =fread ($source, 99999); The following numbers indicate the number of bytes, and a large one will do. You use File_put_contents to write $content to a text file, you can see inside the content is actually the source of the Web page. Get the source of the Web page, we need to analyze the article link address, here to use regular expression, [recommended regular expression Tutorial (http://www.jb51.net/article/7/all/545.1.htm)]. By looking at the source code, we can see that the link address of the article is all this way <div class= "In_arttitle" ><a href= "http://www.jb51.net/article/10/all/" 273.1.htm "> encapsulates the database connection code in a function that is called when it needs to be read. </a>
We can write regular expressions. $count =preg_match_all ("/<div class=\" in_arttitle\ "><a\shref=\" (. +?) \ > (. +?) <\/a>/", $content, $art _list);
Where the array $art_list[1][$s] contains the link address of an article. And $art_list[2][$s] contains the title of an article. By this step, we can calculate half of the success.
Then use the For loop to play each link in turn, and then get the content just like the title. These are similar to the tutorials I've been looking for online, but to this for loop online tutorials can be bad, haven't found an article can be said that the matter, just at the beginning I was using JS to help the cycle, or use the example to say it, just at the beginning I do:
for ($i =0; $i <20;4i++ {
The middle is the part that collects the content, omitted the
Collected a page, must collect another page Ah
But when you open the link with fopen, you can't. Request failed or anything, with JS also not, finally know to use this sentence echo "<meta Http-equiv=refresh content= ' 0;" Url=aa.php?id=1 ' > ', where aa.php is the file name of our program, the number behind the ID can help us realize the loop, collect multiple pages. That's the key to a real loop.
}
Brain a bit uncomfortable, write a bit messy, will see it, in the master's view this may not be a big deal, but for me and so on rookie, it is very helpful.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.