C # Use HtmlAgilityPack to capture web page information,

Source: Internet
Author: User

C # Use HtmlAgilityPack to capture web page information,

I saw a blog post a few days ago: C # Crawling Novels

The blogger uses a regular expression to obtain the name, directory, and content of a novel.

 

The following uses HtmlAgilityPack to rewrite the code of the original blogger:

Before using HtmlAgilityPack, familiarize yourself with XPath: Click me

The Code is as follows:

1 using System; 2 using System. IO; 3 using System. text; 4 using HtmlAgilityPack; 5 6 namespace HtmlAgilityPackDemo 7 {8 class Program 9 {10 static void Main (string [] args) 11 {12 HtmlWeb htmlWeb = new HtmlWeb (); 13 HtmlDocument document = htmlWeb. load ("http://www.23us.so/files/article/html/13/13655/index.html"); 14 FileStream fs = new FileStream ("Xinjiang .txt", FileMode. append, FileAccess. write); 15 StreamWriter sr = new StreamWriter (fs, Encoding. UTF8); 16 try17 {18 HtmlNodeCollection nodeCollection = document. documentNode. selectNodes (@ "// table/tr/td/a [@ href]"); // It indicates getting all 19 foreach (var node in nodeCollection) 20 {21 HtmlAttribute attribute = node. attributes ["href"]; 22 string val = attribute. value; 23 var title = htmlWeb. load (val ). documentNode. selectNodes (@ "// h1") [0]. innerText; // article title 24 var doc = htmlWeb. load (val ). documentNode. selectNodes (@ "// dd [@ id = 'contents']"); // article content 25 var content = doc [0]. innerHtml. replace ("& nbsp ;",""). replace ("<br>", "\ r \ n"); 26 sr. writeLine ("\ r \ n" + title + "\ r \ n" + content); // start writing 27} 28} 29 catch (Exception ex) 30 {31 Console. writeLine (ex. toString (); 32} 33 finally34 {35 sr. close (); 36 fs. close (); 37} 38 Console. writeLine ("OK"); 39 Console. readKey (true); 40 41 42} 43 44 45} 46}
View Code

 

Achieve the same effect as the original blogger!

The code is for reference only !!!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.