For ease of use and future expansion, the scrapy is simply encapsulated in order to requester, the specific code is as follows:
usingSystem;usingSystem.Collections.Generic;usingCrawler.common;namespacecrawler.protocol{ Public classRequester {PrivateUri URL {Get;Set; } PrivateBrowser Browser {Get;Set; } PublicRequester (stringURL, dictionary<string,string> headers =NULL, Browser Browser =NULL) { varU =NewUri (URL); //detect whether the address is a domain name or an IP address, and if it is a domain name, use Dnsresolver to resolve to an IP address varLeftpart = U.getleftpart (uripartial.authority). Replace (U.getleftpart (Uripartial.scheme),""); //whether the regular match is an IP address if(! Regexhelper.ismatch (Leftpart,@"\d+\.\d+\.\d+\.\d+\w")) { varDNS =NewDnsresolver (Leftpart); if(DNS. issuccess) U=NewUri (URL. Replace (Leftpart, DNS. Record.Address.ToString ())); } URL=u; Browser= browser??NewBrowser (); if(Headers = =NULL)return; foreach(varHeaderinchheaders) Browser.headers[header. Key]=header. Value; } Public stringgethtml () {returnbrowser.downloadstring (URL); } Public byte[] GetFile () {returnbrowser.navigatetopage (URL). Rawresponse.body; } }}
Given the possibility of some extensions to scrapybrowser (such as increased support for other protocols such as FTP), the new browser class inherits from the Scrapybrowser class:
using scrapysharp.network; namespace crawler.protocol{ publicclass browser:scrapingbrowser { }}
[Crawler Learning Notes] Scrapysharp Simple Package for requester