C # parsing Html tool-html Agility Pack

Source: Internet
Author: User

I'm just starting out today. Well, it's a little late. My bi design needs to crawl the Watercress movie recommendation, so I need to parse the crawled down the HTML, before using Python to play the parsing, but at present I use C #, I think C # is not worse than Python, there is Microsoft greatly, this does not need to worry, mainly ecological problems. Check the information, found that the HTML Agility pack is better, of course, there are other, I will not say, mainly use it to do.

Official address (you can download the DLL yourself):

Http://html-agility-pack.net/select-nodes

Reference: Html Agility Pack Basic class Introduction and application

Code Design:

Static voidCompleteObjecto, AsyncCompletedEventArgs e) {  //Start parsing HTML varDoc =NewHTMLDocument (); Doc. Load ("e:\ Program Files \c# program code \validate\consoleapplication1\movie.txt", Encoding.UTF8); List<string> movie =Newlist<string>(); //Htmlnodecollection nodecollection = doc. Documentnode.selectnodes ("//ul/li[class=\ "title\"]"); foreach(Htmlnode Ninchnodecollection)  {Console.WriteLine (N.innerhtml.trim ()); Movie.  ADD (N.innertext.trim ()); }    //get the most popular film critics of WatercressHtmlnodecollection NodeCollection1 = doc. Documentnode.selectnodes ("//div[class=\ "review-bd\"]/h3"); foreach(Htmlnode NinchNodeCollection1)  {Console.WriteLine (N.innerhtml.trim ()); Movie.  ADD (N.innertext.trim ()); }    foreach(varMinchmovie)  {Console.WriteLine (M); } file.delete ("e:\ Program Files \c# program code \validate\consoleapplication1\movie.txt"); }   Static voidMain (string[] args) {Console.bufferheight=10000; Console.bufferwidth=10000; stringMoviepath ="e:\ Program Files \c# program code \validate\consoleapplication1\movie.txt"; WebClient WC=NewWebClient (); Wc. useDefaultCredentials=true; Wc. DownloadFileAsync (NewUri ("https://movie.douban.com/"), Moviepath); Wc. Downloadfilecompleted+=NewAsynccompletedeventhandler (complete);  Console.read (); }

For WebClient documents, see Https://msdn.microsoft.com/zh-cn/library/system.net.webclient (v=vs.110). aspx

I have to say, the Microsoft Official website series of documents is really conscience! Before also heard people said, Microsoft's solution and documentation is full, but has been to check the data are directly Baidu, now a change, directly on the Microsoft official website to check .... That's a conscience! And the example is more classic!

C # parsing Html tool-html Agility Pack

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.