The multi-threaded spider program is a very useful component, and I have also provided one in my own spider studio. In the design I try to follow the use of simple principles, a large number of features of dynamic objects, so that the code is very concise and flexible, through 17 lines can achieve a more complete function of the spider program. Now share with you:
public void Run ()
{
dynamic link = new ExpandoObject ();
Link. URL = "http://news.163.com";
Spider.addlink (link);
spider.downloaded + = new Downloadedeventhandler (object sender, Downloadedeventargs e) => {
Logger.Log (E. PAGE.LINK.URL);
foreach (var l in E.page.sublinks)
{
if (l.depth <= 2) Spider.addlink (l);//////////////(only Web pages Depth within 2
}
}); c12/>spider.erroroccurred + = new Erroroccurredeventhandler (object sender, Erroroccurredeventargs e) =>
{ Logger.Log (E.error.message); });
Spider.start (10); Start 10 threads to run
spider.wait ();//wait for all threads to complete
spider.stop ();
}
Spider will use the following objects, most of which are dynamic (here's a JSON example):
Link-{Url: ' string ', Title: ' String ', depth:1}
Downloadedeventargs-{Page: {Link: {url: ' string ', Title: ' String ', depth:1}, Html: ' String ', sublinks: [{url: ' string ' , Title: "string", Depth:1}]}
Erroroccurredeventargs-{Error:exceptiontype}
In simple terms, it is:
Link has three attributes: Url (String), Title (string), and depth (reshaping);
Downloadedeventargs has one attribute: page, page has three attributes: Page.link (link type), Html (String), and sublinks (link array);
Erroroccurredeventargs has an attribute: Error (Exception type)
Operation Condition:
See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/extra/