(C #) Use ScrapySharp to concurrently download Tianya images,

Source: Internet
Author: User

(C #) Use ScrapySharp to concurrently download Tianya images,

   

Recently, because a job needs to complete the CNKI crawler, when studying the crawler architecture, we found the ScrapySharp, which is suspected to have been transplanted to the famous Python open-source crawler framework Scrapy, however, only the Demo of F # was found on the Internet, and the sample website in the original article was used to write the C # version code.

PS: After the study, we found that the gap between ScrapySharp and Scrapy is still quite large. Without the eight well-developed components like Scrapy, it only contains the webpage Content Retrieval and web page resolution functions extended based on HtmlAgilityPack, i'm a little disappointed.

        
Using System; using System. IO; using System. linq; using System. threading. tasks; using HtmlAgilityPack; using ScrapySharp. extensions; using ScrapySharp. network; namespace ScrapySharpDemo {class Program {static void Main (string [] args) {// sample Website address var url =" http://bbs.tianya.cn/post-12-563201-1.shtml "; Var web = new ScrapingBrowser (); var html = web. downloadString (new Uri (url); var doc = new HtmlDocument (); doc. loadHtml (html); // obtain the image address var urls = doc in the website. documentNode. cssSelect ("div. bbs-content> img "). select (node => node. getAttributeValue ("original ")). toList (); // download the image Parallel in Parallel. forEach (urls, SavePic);} public static void SavePic (string url) {var web = new ScrapingBrowser (); // due to limitations of Tianya website, Images cannot be accessed from external sources on all sites, therefore, set the Refer attribute of the request header to the current page address web. headers. add ("Referer "," http://bbs.tianya.cn/post-12-563201-1.shtml "); Var pic = web. navigateToPage (new Uri (url )). rawResponse. body; var file = url. substring (url. lastIndexOf ("/", StringComparison. ordinal); if (! Directory. Exists ("imgs") Directory. CreateDirectory ("imgs"); File. WriteAllBytes ("imgs" + file, pic );}}}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.