Do you think the update of the fire shadows is slow? Do you think the cartoon websites cannot download them? Look at this ^_^
PS: Web-harvest http://web-harvest.sourceforge.net
1. Logical File
<? XML version = "1.0" encoding = "UTF-8"?> <Br/> <config> <br/> <include Path = "functions. XML "/> </P> <p> <var-Def name =" num "Overwrite =" false "> 1 </var-Def> <br/> <loop index = "I" item = "url"> <br/> <! -- Get list of name --> <br/> <list> <br/> <var-Def name = "imagelinks"> <br/> <call name = "Download-multipage -list "> </P> <p> <call-Param name =" pageurl "> <template> http://www.narutom.com/comic/index.html </template> </call-param> </P> <p> <call-Param name = "nextxpath"> // Div [@ class = 'pagenav']/A [last () -1]/@ href </call-param> </P> <p> <call-Param name = "itemxpath"> // Div [@ ID = 'dm _ name ']/ul/Li/A/text () </call-param> </P> <P> <call-Param name = "maxloops"> <template >$ {num} </template> </call-param> <br/> </call> <br/> </var-Def> <br/> </List> <br/> <body> <br/> <empty> <br/> <! -- Get ordinal --> <br/> <var-Def name = "ordinal"> <br/> <Regexp-pattern> ^/D *( /D *)? /D * $ </Regexp-pattern> <br/> <Regexp-source> <template >$ {URL} </template> </Regexp-source> <br/> <Regexp-result> <br/> <template >$ {_ 1} </template> <br/> </Regexp-result>-<br/> </Regexp> <br/> </var-Def> <br/> <! -- Output --> <br/> <call name = "getcomic"> <br/> <call-Param name = "fromnum"> <template >$ {ordinal} </template> </call-param> <br/> <call-Param name = "directory"> <template >$ {URL} </template> </call-param> <br /> </call> <br/> </empty> <br/> </body> <br/> </loop> <br/> </config>
2. function library files
<? XML version = "1.0" encoding = "UTF-8"?> <Br/> <config> <br/> <! -- <Br/> download multi-page list of items. </P> <p> @ Param pageurl-URL of starting page <br/> @ Param itemxpath-XPath expression to obtain single item in the list <br/> @ Param nextxpath- XPATH expression to URL for the next page <br/> @ Param maxloops-maximum number of pages downloaded </P> <p> @ return list of all downloaded items <br/> --> <br/> <function name = "Download-multipage-list"> <br/> <r Eturn> <br/> <while condition = "$ {pageurl. tostring (). Length ()! = 0} "maxloops =" $ {maxloops} "Index =" I "> <br/> <empty> <br/> <var-Def name =" content "> <br/> <HTML-to-XML> <br/> <pttp url = "$ {pageurl}" charset = "gb2312"/> <br/> </ptml- -XML> <br/> </var-Def> <br/> <var-Def name = "nextlinkurl"> <br/> <XPath expression = "$ {nextxpath} "> <br/> <var name =" content "/> <br/> </XPath> <br/> </var-Def> <br/> <var- def name = "pageurl"> <br/> <! -- <Template >$ {sys. fullurl (pageurl. tostring (), nextlinkurl. tostring ()} </template> --> <br/> <template >$ {nextlinkurl. tostring ()} </template> <br/> </var-Def> <br/> </empty> </P> <p> <XPath expression = "$ {itemxpath}"> <br/> <var name = "content"/> <br/> </XPath> <br/> </while> <br/> </return> <br/> </function> </P> <p> <! -- Naruto --> <br/> <function name = "getcomic"> <br/> <while Index = "J" condition = "$ {J. toint ()! = 20} "> <br/> <var-Def name =" pageurl "> <br/> <template> watermark </template> <br/> </var-def> <br/> <file action = "write" Path = '/home/xyzqing/webharvest/Naruto/$ {directory}/canonical fig 'Type = "binary"> <br/> <pttp url = "$ {pageurl}"/> <br/> </File> <br/> </while> <br/> </function> <br/> </config>
3. Effect
PS: there may be a few useless images, which is a flaw in technology, but it does not affect watching for fans. In addition, the special article is not extracted because the ID is not consecutive. You can modify the example by yourself. After all, there are only a few articles.
Running Method: Download The webharvest jar package and run "logical file" with its built-in UI"
You can.
Of course, you must configure the output path yourself.
Welcome to the discussion.