Regular crawl of the Tianya data, recursive failure, solving ....

Source: Internet
Author: User
This post was last edited by liuser_cn on 2013-08-12 21:25:48

Foreword: My Object-oriented foundation is general.

I am a catch of a module (nonsense).
Its next page is made with time stamps.
I just thought, after grabbing all the header URLs on the first page, I grabbed the next page address (' Can crawl ').
Now, the URL for all the headings on the first page has been scratched, and the URL for the next page has been captured,
I want to recursion 100 times, grab 100 pages of all the headers of the URL.
Look at the code.
   Public Function Getallpage ($url) {        /**         * curl_setopt ($ch, Curlopt_failonerror, true);//Log error message settings         * Curl_ errno can get the error code, of course, including the wrong HTTP status code           curl_error can get error message         *        /$ch =  curl_init ($url);//Initialize a handle        curl_ Setopt ($ch, curlopt_returntransfer,true);        curl_setopt ($ch, curlopt_timeout,1111111);        $html = curl_exec ($ch);        Curl_close ($ch);                Modify it and start from there.        $length     = Strpos ($html, ' class= ' mt5 ');        $newHtml    = substr ($html, $length);        Modifier end            $pattern    = "#\/post-.*\.shtml#i";//Regular Expression            preg_match_all ($pattern, $newHtml, $matches);            Crawl next page link address            $nextPagePattern   =    "#\ $v) {                 $matches [$k]   =   ' http://bbs.tianya.cn '. $v;            }            //Before the recursion is here, a run directly dead .....        return Array (            ' 0 ' = $matches,            ' 1 ' = = $nextPageUrl,        );    }


I want to ask, is there a problem with this idea?
Recursive code convenient for one? = =!!


Reply to discussion (solution)

foreach ($ matches[' 0 '] as $k = = $v) {
$matches [$k] = ' http://bbs.tianya.cn '. $v;
}

What is the meaning of modifying an array in a loop?
The code for your recursive part is also posted.

foreach ($ matches[' 0 '] as $k = = $v) {
$matches [$k] = ' http://bbs.tianya.cn '. $v;
}

What is the meaning of modifying an array in a loop?
The code for your recursive part is also posted.
1: Complete, fetch to the address no domain name.
2: Recursion ... That's what I wrote before that note.

for ($i =0; $i <100; $i + +) {  $this->getallpage ($NEXTPAGEURL)             }

for ($i =0; $i <100; $i + +) {
$this->getallpage ($NEXTPAGEURL)
}

This is not a recursive fetch 100 times.
Instead of looping 100 times, each time a recursive function is executed, and your recursive function is not exported (without jumping out of the recursive exit, which leads to infinite recursion), of course it will be dead.

for ($i =0; $i <100; $i + +) {
$this->getallpage ($NEXTPAGEURL)
}

This is not a recursive fetch 100 times.
Instead of looping 100 times, each time a recursive function is executed, and your recursive function is not exported (without jumping out of the recursive exit, which leads to infinite recursion), of course it will be dead.
Please enlighten me?

for ($i =0; $i <100; $i + +) {
$this->getallpage ($NEXTPAGEURL)
}

This is not a recursive fetch 100 times.
Instead of looping 100 times, each time a recursive function is executed, and your recursive function is not exported (without jumping out of the recursive exit, which leads to infinite recursion), of course it will be dead.
Is the exit a judgment?

For your needs, you can do this:
Public Function Getallpage ($url, $depth, & $result)
The $depth controls the depth of recursion, initially 0. The $result record of the reference type records the final match to the result.

Recursive part of the Jump:
if ($depth = = 100) {
Return
}

Recursive parts of recursive functions:
$NEXTPAGEURL = "http://bbs.tianya.cn". $nextPage [' 1 '];
foreach ($matches [' 0 '] as $k = + $v) {
$result [] = ' http://bbs.tianya.cn '. $v;
}
Getallpage ($NEXTPAGEURL, $depth +1, $result);


Recursive function initial invocation:
$result = Array ();
Getallpage ($url, 0, $result);


for ($i =0; $i <100; $i + +) {
$this->getallpage ($NEXTPAGEURL)
}

This is not a recursive fetch 100 times.
Instead of looping 100 times, each time a recursive function is executed, and your recursive function is not exported (without jumping out of the recursive exit, which leads to infinite recursion), of course it will be dead.
Is the exit a judgment?

It's not really. Even if you remove the loop, use only Getallpage (...). That part, too, will die.

For your needs, you can do this:
Public Function Getallpage ($url, $depth, & $result)
The $depth controls the depth of recursion, initially 0. The $result record of the reference type records the final match to the result.

Recursive part of the Jump:
if ($depth = = 100) {
Return
}

Recursive parts of recursive functions:
$NEXTPAGEURL = "http://bbs.tianya.cn". $nextPage [' 1 '];
foreach ($matches [' 0 '] as $k = + $v) {
$result [] = ' http://bbs.tianya.cn '. $v;
}
Getallpage ($NEXTPAGEURL, $depth +1, $result);


Recursive function initial invocation:
$result = Array ();
Getallpage ($url, 0, $result);
Thanks a lot!!

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.