Perl: analyzes the page and extracts the name of the download link and file.

Source: Internet
Author: User

System: Windows

Language: Perl

Tool: notepad ++/CMD

Currently, Perl is used to analyze the page. Then, the group downloads the files and renames them.

First, the rarfile is connected to the second-level file. The rarfile name is a number, and the corresponding Chinese name is connected to the second-level page on the first-level page. I have read the Perl cookbook and it is recommended to use modules. However, I plan to write a script by myself using a regular expression. Here we only use the simple lwp: simple module.

Technology: here we use the regular expression $1, $2... to extract the segments required in a row. (Note that the namespace of $1, $2... is only a regular expression. The latest Regular Expression with () has expired. Unless it is immediately assigned to a variable .)

#t.pl# to split out index url and rar page.use warnings;use LWP::Simple;sub getDownloadPage {my @lines=split("\n", $_[0]);my $line1="";    foreach my $line(@lines) {        if ($line=~/<li class="itm">[^<]*<span> *[0-9]{4}-[0-9]{2}-[0-9]{2} *<\/span>[^<]*<a href="([^"> ]*)" *>([^<]*)</) {            print $1," ",$2,"\n";        }    }}my @indexes;unshift @indexes, "http://www.yingyu.com/stxz/chuzhong/zhongkao/";# get index page.my $content=get($indexes[0]);my @hrefs=split "href=\"", $content;shift @hrefs;foreach $href(@hrefs) {    if($href=~/(http:\/\/.*index[_0-9]*\.shtml)" *>[0-9]+/) {        push @indexes, $1;    }}#page download page and its relative Chinese name.foreach $index(@indexes) {    $content=get($index);    # my @pages=split "<li ", $content;    # shift @pages;    getDownloadPage($content);    }

 

Perl: analyzes the page and extracts the name of the download link and file.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.