1.shell Crawler Examples:
[Email protected] ~]# vim pa.sh #!/bin/bashwww_link=http://www.cnblogs.com/clsn/default.html?page=for i in {1..8 }doa= ' Curl ${www_link}${i} 2>/dev/null|grep homepage|grep-v "ImageLink" |awk-f "[><\"] "' {print $7" @ "$9} ' > >bb.txt ' #@ the delimiter specified for itself. This line is to get content and content URL doneegrep-v "pager" Bb.txt >ma.txt #将处理后, only content and content URLs left in a file b= ' sed ' s# # #g " Ma.txt ' #将文件里的空格去掉, because the for loop will take the space of each line as two variables, instead of a variable, this pit took me a long time. For i in $bdo c= ' echo $i |awk-f @ ' {print $ ' #c = content url d= ' echo $i |awk-f @ ' {print $} ' #d = Content echo ' <a hr ef= ' ${c} ' target= ' _blank ' >${d}</a> ' >>cc.txt #cc. txt to generate the text of a label done
Crawler results show: Shanyan crawler results in archived files
Example of shell and Python crawler display