C # parsing HTML artifact HTML Agility Pack

Last Update:2014-08-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Once, I foolishly used regular expressions to successfully parse the school's news network, educational management system, library management system I want all the content. At that time, a great effort to write that the regular Ah, and finally still a variety of not to force, often there will be unexpected bugs appear, and finally after countless repair can be used normally. But it's still very uncomfortable. Later saw others use this thing to parse HTML, feel very strong, today I try to do a bit, then a few days of code, with this class library a few minutes to get it done. Nonsense not much to say, into the subject.

Html Agility Pack Home: http://htmlagilitypack.codeplex.com/

Author's homepage: http://zhoufoxcn.blog.51cto.com/792419/595344/

Using the class library The first step: referencing the class library;

Step two: Load the HTML file: Support local files, or take advantage of the document provided by the class library. Loadhtml () method to load a remote resource

Step three: Get the root node:

Htmlnode RootNode = document. Documentnode;

The third step: under the root node to find what you look for, this I did not all try to do, here are some of my test code, the analysis is NetEase's news page;

HTMLDocument document=new HTMLDocument ();            Document. Load (@ "E:\c.htm", Encoding.default);            Htmlnode RootNode = document. Documentnode;            Htmlnode Titlenode = Rootnode.selectsinglenode ("//h1[@id = ' h1title ']");            Console.WriteLine ("-------------------------title-------------------------------");            Console.WriteLine (titlenode.innerhtml);            Console.WriteLine ("-------------------------Time-------------------------------");            Htmlnode Timenode = Rootnode.selectsinglenode ("//div[@class = ' ep-info cdgray ']/div[@class = ' Left ']");            Console.WriteLine (timenode.innerhtml);            Console.WriteLine ("-------------------------body-------------------------------");            Htmlnode Newsnode = Rootnode.selectsinglenode ("//div[@class = ' end-text ']");            Console.WriteLine (newsnode.innerhtml); Console.readkey ();

The official documentation tells us that you can use the following method to get one or more child nodes below the root node:

/ARTICLES/ARTICLE[1]: Select the first article element that belongs to the articles child element.
/articles/article[last ()]: Select the last article element that belongs to the articles child element.
/articles/article[last ()-1]: Select the second-to-last article element that belongs to the articles child element.
/articles/article[position () <3]: Selects the first two article elements that belong to the child elements of the bookstore element.
title[@lang]: Selects all the title elements that have properties named Lang.
createat[@type = ' ZH-CN '): Selects all createat elements that have a type attribute with a value of ZH-CN.
/ARTICLES/ARTICLE[ORDER>2]: Selects all article elements of the articles element, and the value of the Order element must be greater than 2.
/articles/article[order<3]/title: Selects all the Title elements of the article element in the articles element, and the value of the Order element must be less than 3.

The most useful path expressions are listed below:
NodeName: Selects all child nodes of this node.
/: Selected from the root node.
: Selects the nodes in the document from the current node of the matching selection, regardless of their location.
.: Select the current node.
: Select the parent node of the current node

C # parsing HTML artifact HTML Agility Pack

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

C # parsing HTML artifact HTML Agility Pack

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

C # parsing HTML artifact HTML Agility Pack

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support