Capture webpages using htmlagilitypack

Source: Internet
Author: User

Previously, I used regular expressions to crawl pages. I had a hard time crawling some things.

On the Internet, I saw someone else recommended an htmlagilitypack item. I found some information on the Internet and wrote an example of crawling web pages. ASP. NET MVC 4 for the framework. Let's first look at the effect.

Demo address: http://www.5imvc.com/Html Alternate address: http://linfei721.s5.jutuo.net/html

First download the plug-in, which is available in nuget.

Create model

 ///   <Summary>      ///  Page capture results  ///   </Summary>      Public   Class  Result {  ///  <Summary>          ///  Link  ///   </Summary>          Public   String URL { Get ; Set  ;}  ///   <Summary>          ///  Title  ///   </Summary>         Public   String Title { Get ; Set  ;}  ///   <Summary>          ///  Avatar address  ///   </Summary>          Public   String IMG { Get ; Set ;}  ///   <Summary>          ///  Body content  ///   </Summary>          Public   String Content { Get ; Set  ;}} 

Controllers:

Import namespace:

 
UsingHtmlagilitypack;
         Public Actionresult index (){  Return  View (getlist ());}  ///   <Summary>          ///  Capture Method  ///   </Summary>          ///   <Returns> </returns>          Public List <result> Getlist () {list <Result> List = New List <result> ();  # Region Old-fashioned regular crawling //  System. net. webrequest Req = system. net. webrequest. Create ("  Http://www.cnblogs.com/  ");  //  System. net. webresponse res = Req. getresponse ();  //  Getresponse blocks until the response arrives  // System. Io. Stream receivestream = res. getresponsestream ();  //  Read the stream into a string  //  System. Io. streamreader sr = new system. Io. streamreader (receivestream );  //  String resultstring = Sr. readtoend ();  //  String regstr = "  // Matchcollection matches = RegEx. Matches (resultstring, regstr, regexoptions. multiline );  //  Foreach (match item in matches)  //  {  //  List. Add (New Result {url = item. Groups [1]. Value, Title = item. Groups [2]. Value });  //  }              # Endregion  Htmlweb = New Htmlweb (); htmldocument htmldoc = Htmlweb. Load ( @"  Http://www.cnblogs.com/  "  );  //  Select the blog Home PageArticleList Htmldoc. documentnode. selectnodes ( "  // Div [@ ID = 'Post _ list']/Div [@ class = 'Post _ item']  "  ). Asparallel (). tolist (). foreach (AC => { //  Capture images. If there is no space, store the variables. Htmlnode node = ac. selectsinglenode ( "  . // P [@ class = 'Post _ item_summary ']/A/img  "  ); List. Add (  New  Result {URL = Ac. selectsinglenode ( "  . // A [@ class = 'titlelnk ']  " ). Attributes [ "  Href "  ]. Value, title = Ac. selectsinglenode ( "  . // A [@ class = 'titlelnk ']  "  ). Innertext,  //  If the image is blank, the default image is displayed. IMG = node = Null ? Virtualpathutility. toabsolute ( "  ~ /Content/img/avatar.png  " ): Node. attributes [ " SRC  "  ]. Value, content = Ac. selectsinglenode ( "  . // P [@ class = 'Post _ item_summary ']  "  ). Innertext });});  Return  List ;} 

View:

 
@ Model ienumerable <result>
 @{  Foreach ( VaR ItemIn  Model ){ <Div Class = "  Newsitem  " > <Div>  "  @ Item. img  "   Class = "  How.mg  " Alt = "  News " /> <H3> <a href = "  @ Item. url  " Target = "  _ Blank  " > @ Item. Title </a>  @ Item. Content </P> </div> <p> <a href = "  @ Item. url  " Title = "" > View the full text </a> </P> </div> }} 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.