Use C # webbrowser and Application. DoEvents () to collect dynamic web pages.

Source: Internet
Author: User

It has been more than a year since the undergraduate course started to capture and collect network data. It started with a regular expression for static web pages to capture information from the network. However, as the work goes deeper,
It is found that many web pages cannot be captured simply by using regular expressions. For example, the next page links of many web pages are generated by JavaScript Functions, such
<Li> <a href = "#" onclick = "javascript: gotoPage (2)"> 2 </a> </li>, even if you use a regular expression, if href is extracted, the next page cannot be obtained.
In addition, if there is a "#" field in the url, the webpage source code stream obtained using httpresponse and httprequest is different from the page view you see in the browser. Therefore, use a regular expression only, then, it seems powerless to process dynamic web pages with js scripts.
What should I do?

DOM + Regular Expression + browser components can be used to solve the above problems.

DOM (Document Object Model) is an interface standard that parses html webpages into a tree format. For details about DOM tutorials, see: http://www.w3.org/DOM/ although the above is about JavaScript DOM interface function, but because DOM is an interface standard, DOM interface implemented by other languages is also similar.

Regular Expression: It plays an indispensable role in completing text matching. DOM cannot be replaced by this powerful tool.

Browser components: contains the function of interpreting JS statements. With the help of browser components, our work will be more effort-saving (In addition, some netizens in the garden suggested Xpath and webrequest, etc., which have never been used, if you are familiar with this, you may wish to talk about it)

This function uses the VS2008 C # Winform Platform

To call regular expressions on this platform, you must add a statement in the program header:

Using System. Text. RegularExpressions;

To call the DOM component, you must add Microsoft. mshtml to the reference of the project.

The browser component uses webbrowser.

 

First, we need to construct a simple browser in the program. We need to have a combobox list box (displaying the URL of the current webpage), the forward and backward buttons, and control the browser to refresh the view. The implementation code is as follows:

 

Simple browser forward and backward functions in the program Private void btnGo_Click (object sender, EventArgs e)
{
String url = comboBox1.Text. Trim ();

WebBrowser1.Navigate (url );
}

Private void btnBack_Click (object sender, EventArgs e)
{
WebBrowser1.GoBack ();
}

 

It is not enough to move forward or backward. We hope that after the browser view is refreshed, the URL in combobox will also be refreshed. Therefore, we need to add a Navigated event to the browser to update the text displayed in combobox. The Code is as follows:

 

Private void webbrowserinclunavigated (object sender, WebBrowserNavigatedEventArgs e)
{
ComboBox1.Text = webBrowser1.Url. ToString ();
}

 

This is not enough. When you implement the above Code, you will find that when you click the link in webbrowser, a new webpage will be displayed in local IE, therefore, we need to add a NewWindow event Code as follows:

 

Webbrowser NewWindow Private void webBrowser1_NewWindow (object sender, CancelEventArgs e)
{
E. Cancel = true;
If (webBrowser1.Document. ActiveElement! = Null)
{
WebBrowser1.Navigate (webBrowser1.Document. ActiveElement. GetAttribute ("
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.