Rookie Htmlagilitypack First experience ... Weak weak code ...
The Html Agility Pack is an open source project that provides a standard DOM API and XPath navigation for Web pages. Pages downloaded using WebBrowser and HttpWebRequest can be parsed using the HTML Agility pack.
Htmlagilitypack documents are in CHM format and sometimes are not readable by CHM format files. If IE cannot be linked to the page you requested or "page cannot be displayed" after opening. Right-click on the CHM file you want to open, you will have a "unlock" in the bottom property, and you can click it to display it correctly.
If you need to download, please click htmlagilitypack.1.4.0 Download, unzip and find HtmlAgilityPack.dll, add it to the project.
The classes in HtmlAgilityPack.dll are located in the Htmlagilitypack namespace.
The HTMLDocument represents a complete HTML document. Load the Web page with the Load method.
The following is the first experience of Htmlagilitypack,
To achieve the goal :, click the button, according to the given URL, print out all the links to the page. The simple code is as follows:
1 usingSystem;2 usingSystem.Collections.Generic;3 usingSystem.ComponentModel;4 usingSystem.Data;5 usingSystem.Drawing;6 usingSystem.Linq;7 usingSystem.Text;8 usingSystem.Windows.Forms;9 usingHtmlagilitypack;Ten One namespaceHtmlAgilityPackDemo1 A { - Public Partial classForm1:form - { the PublicForm1 () - { - InitializeComponent (); - } + - Private voidForm1_Load (Objectsender, EventArgs e) + { A at } - - Private voidButton1_Click (Objectsender, EventArgs e) - { -Htmlweb webClient =NewHtmlweb (); - htmlagilitypack.htmldocumentDoc = Webclient.load ("Http://www.cnblogs.com/lmei"); in -Htmlnodecollection hreflist = doc. Documentnode.selectnodes (".//a[@href]"); to + if(Hreflist! =NULL) - { the foreach(Htmlnode hrefinchhreflist) * { $Htmlattribute att = href. attributes["href"];Panax Notoginseng Console.WriteLine (Att. Value); - the } + A } the + } - } $}
When the 28th line of code above is written as follows,
htmldocument doc = webclient.load ("http://www.cnblogs.com/lmei");
An error message will appear,
The following changes are then
Htmlagilitypack.htmldocument doc = webclient.load ("http://www.cnblogs.com/lmei" );
Next, look at the output of the console, as follows:
Visible, the hyperlinks above the page are printed ...
Of course, if you want to crawl the body of the page above, may be garbled after loading problems, you can specify the encoding of the file:
Htmlagilitypack.htmldocument Htmldoc = new Htmlagilitypack.htmldocument ();
Encoding encoder = encoding.getencoding ("utf-8"); Htmldoc.load (" http://www.cnblogs.com/lmei/p/3485649.html", encoder);