In this case, the main source of the data is obtained by Htmlagilitypack parsing HTML sources.
Using Htmlagilitypack;
1. Get the Web page source code through the Webrequest,webresponse,streamreader class in C #
WebRequest request = webrequest.create (URL);
using (WebResponse response = Request. GetResponse ())
using (StreamReader reader = new StreamReader (response. GetResponseStream (), encoding))
result = reader. ReadToEnd ();
2. Get Htmlnode via web page URL, get it through the HTMLDocument class in Htmlagilitypack
Htmlagilitypack.htmldocument document = new Htmlagilitypack.htmldocument ();
Document. Loadhtml (Htmlsource);
Htmlnode RootNode = document. Documentnode;
return rootNode;
3. You can get what you need with Htmlnode's selectSingleNode method, note that path is the tag path for HTML in the following code: path= "//div[@class = ' Article_title ']/h1/span /a ";//Article title path
Corresponds to
<div class= ' Article_title ' >
<span>
<a> Get the content here
</a>
</span>
</div>
The reference source is as follows:
Htmlnode temp = srcnode.selectsinglenode (path);
if (temp = = null)
return null;
Return temp. InnerText;
The return value is: get the content here
where temp. innerHTML can get the content of the website HTML as:<a> get the content here </a>
Through the above operation can get to the website you need content, hope this content to everybody helpful, cite source article link http://blog.csdn.net/gdjlc/article/details/11620915
Website Data acquisition