(C #) Use ScrapySharp to concurrently download Tianya images,
Recently, because a job needs to complete the CNKI crawler, when studying the crawler architecture, we found the ScrapySharp, which is suspected to have been transplanted to the famous Python open-source crawler framework Scrapy, however, only the Demo of F # was found on the Internet, and the sample website in the original article was used to write the C # version code.
PS: After the study, we found that the gap between ScrapySharp and Scrapy is still quite large. Without the eight well-developed components like Scrapy, it only contains the webpage Content Retrieval and web page resolution functions extended based on HtmlAgilityPack, i'm a little disappointed.
Using System; using System. IO; using System. linq; using System. threading. tasks; using HtmlAgilityPack; using ScrapySharp. extensions; using ScrapySharp. network; namespace ScrapySharpDemo {class Program {static void Main (string [] args) {// sample Website address var url =" http://bbs.tianya.cn/post-12-563201-1.shtml "; Var web = new ScrapingBrowser (); var html = web. downloadString (new Uri (url); var doc = new HtmlDocument (); doc. loadHtml (html); // obtain the image address var urls = doc in the website. documentNode. cssSelect ("div. bbs-content> img "). select (node => node. getAttributeValue ("original ")). toList (); // download the image Parallel in Parallel. forEach (urls, SavePic);} public static void SavePic (string url) {var web = new ScrapingBrowser (); // due to limitations of Tianya website, Images cannot be accessed from external sources on all sites, therefore, set the Refer attribute of the request header to the current page address web. headers. add ("Referer "," http://bbs.tianya.cn/post-12-563201-1.shtml "); Var pic = web. navigateToPage (new Uri (url )). rawResponse. body; var file = url. substring (url. lastIndexOf ("/", StringComparison. ordinal); if (! Directory. Exists ("imgs") Directory. CreateDirectory ("imgs"); File. WriteAllBytes ("imgs" + file, pic );}}}