Yunshu
In the evening, I carefully debugged the program and found a bug. I handled 302 redirect when capturing the page content, but did not handle javascript redirection. After a simple task, you can use it to optimize and make careful judgments.
Sub GetContentFromUrl
{
My $ url = shift;
My $ time_out = shift;
Chomp ($ url );
My $ ua = LWP: UserAgent-> new ();
$ Ua-> cookie_jar ({});
$ Ua-> agent (Mozilla/4.0 (compatible; MSIE 6.0; Windws NT 5.1 ));
$ Ua-> timeout ($ time_out );
My $ res_obj = $ ua-> get ($ url) | warn "get $ url error, $! N ";
Print "$ url:". $ res_obj-> status_line. "n ";
If (! $ Res_obj-> is_success)
{
Return undef;
}
My $ html = $ res_obj-> as_string;
If ($ html = ~ M/window. location [. href]? S * = s *"(.*?) "/Is)
{
My $ tmp_url = URI: URL-> new ($1, $ url );
$ Tmp_url = $ tmp_url-> abs;
Print "found window. location, will follow $ tmp_urln ";
& GetContentFromUrl ($ tmp_url, $ time_out );
}
Else
{
Return $ html;
}
}