C # HTML parsing tool-Html Agility Pack,

Source: Internet
Author: User

C # HTML parsing tool-Html Agility Pack,

It's a bit late today. My design requires recommendations for movies that crawl Douban, so I need to parse the crawled html. I used Python for parsing, but currently I am using C #, I think C # is no worse than python, and Microsoft is a big contributor. This does not need to be worried, but it is mainly due to ecological problems. I checked the information and found that the Html Agility Pack is better. Of course there are other things. I will not talk about it, but I will mainly use it.

Official Website address (you can download the dll yourself ):

Http://html-agility-pack.net/select-nodes

Reference: introduction and application of Html Agility Pack basic classes

Code Design:

Static void complete (object o, AsyncCompletedEventArgs e) {// start parsing html var doc = new HtmlDocument (); doc. load ("E: \ Program file \ C # program code \ Validate \ ConsoleApplication1 \ movie.txt", Encoding. UTF8); List <string> movie = new List <string> (); // HtmlNodeCollection nodeCollection = doc. documentNode. selectNodes ("// ul/li [class = \" title \ "]"); foreach (HtmlNode n in nodeCollection) {Console. writeLine (n. innerHtml. trim (); movie. add (n. innerText. trim ();} // obtain the most popular image of. HtmlNodeCollection nodeCollection1 = doc. documentNode. selectNodes ("// div [class = \" review-bd \ "]/h3"); foreach (HtmlNode n in nodeCollection1) {Console. writeLine (n. innerHtml. trim (); movie. add (n. innerText. trim ();} foreach (var m in movie) {Console. writeLine (m);} File. delete ("E: \ Program file \ C # program code \ Validate \ ConsoleApplication1 \ movie.txt");} static void Main (string [] args) {Console. bufferHeight = 10000; Console. bufferWidth = 10000; string moviePath = "E: \ Program file \ C # program code \ Validate \ ConsoleApplication1 \ movie.txt"; WebClient wc = new WebClient (); wc. useDefaultCredentials = true; wc. downloadFileAsync (new Uri (" https://movie.douban.com/ "), MoviePath); wc. DownloadFileCompleted + = new AsyncCompletedEventHandler (complete); Console. Read ();}

For the WebClient documentation, see https://msdn.microsoft.com/zh-cn/library/system.net.webclient (v = vs.110). aspx

I have to say that the series of documents on the Microsoft official website are really Conscientious! I have heard people say that Microsoft's solutions and documentation are very comprehensive, but I have been looking for information directly from Baidu. Now I have changed the method and checked it on the official Microsoft website! The example is classic!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.