System: Windows
Language: Perl
Tool: notepad ++/CMD
Currently, Perl is used to analyze the page. Then, the group downloads the files and renames them.
First, the rarfile is connected to the second-level file. The rarfile name is a number, and the corresponding Chinese name is connected to the second-level page on the first-level page. I have read the Perl cookbook and it is recommended to use modules. However, I plan to write a script by myself using a regular expression. Here we only use the simple lwp: simple module.
Technology: here we use the regular expression $1, $2... to extract the segments required in a row. (Note that the namespace of $1, $2... is only a regular expression. The latest Regular Expression with () has expired. Unless it is immediately assigned to a variable .)
#t.pl# to split out index url and rar page.use warnings;use LWP::Simple;sub getDownloadPage {my @lines=split("\n", $_[0]);my $line1=""; foreach my $line(@lines) { if ($line=~/<li class="itm">[^<]*<span> *[0-9]{4}-[0-9]{2}-[0-9]{2} *<\/span>[^<]*<a href="([^"> ]*)" *>([^<]*)</) { print $1," ",$2,"\n"; } }}my @indexes;unshift @indexes, "http://www.yingyu.com/stxz/chuzhong/zhongkao/";# get index page.my $content=get($indexes[0]);my @hrefs=split "href=\"", $content;shift @hrefs;foreach $href(@hrefs) { if($href=~/(http:\/\/.*index[_0-9]*\.shtml)" *>[0-9]+/) { push @indexes, $1; }}#page download page and its relative Chinese name.foreach $index(@indexes) { $content=get($index); # my @pages=split "<li ", $content; # shift @pages; getDownloadPage($content); }
Perl: analyzes the page and extracts the name of the download link and file.