[Reprinted] C # Use the webbrowser control with multiple threads

Last Update:2018-12-05 Source: Internet

Author: User

Tags blank page

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I was planning to analyze the page in the event, but after half a day, I did not call the processing function, and the search was solved, I would like to thank the original author, webbroswer control navigate, for generating a page before obtaining WB. htmldocument then analyzes the elements and labels of htmldocument. In fact, the collection does not only collect a single page. In the main form, the collection can be completed. For example, if you collect several list pages and have more than N pages, one cycle goes down, when webbrowser is used to respond, it will lead to a false death. At this time, we will surely think of using multiple threads to do the C # multithreading. We should all know that there are two modes, STA and MTA. However, the webbrowser control has a bad characteristic: It only supports the multi-thread sta mode, such as the following code,

Thread tread = new thread (New parameterizedthreadstart (begincatch ));
Tread. setapartmentstate (apartmentstate. Sta );
Tread. Start (URL );
Code
Private void begincatch (Object OBJ)
{
String url = obj. tostring ();
Webbrowser WB = new webbrowser ();
WB. scripterrorssuppressed = true;
WB. navigate (URL );
WB. documentcompleted + = new webbrowserdocumentcompletedeventhandler (wb_documentcompleted );
}

When you need to analyze the htmldocument generated by webbrowser, you must perform operations in the event documentcompleted. Only in this case can webbrowser load be completed. This is just a trap !!!! Webbrowser has a feature, that is, when the multi-thread sta is used, it simply does not wait for the execution of documentcompleted, that is, it cannot perform subsequent operations !!! In this case, what should we do? Someone may think of the WB. Document. Write (string) method as follows:
Code

Private void begincatch (Object OBJ)
{
String url = obj. tostring ();
Webbrowser WB = new webbrowser ();
WB. scripterrorssuppressed = true;
String htmlcode = gethtmlsource (URL );
WB. Document. Write (htmlcode );
// Perform the analysis operation
}
// Obtain the web page source code from WebClient
Private string gethtmlsource (string URL)
{
String text1 = "";
Try
{
System. net. WebClient WC = new WebClient ();
Text1 = WC. downloadstring (URL );
}
Catch (exception exception1)
{}
Return text1;
}

But at this time, we will find that WB. documenttext is always absent. At that time, I was also very depressed. I searched for articles in the garden and msdn, all of them can be assigned values using documenttext, but I also found a lot on the Internet that I did not get any results after I tried to search in the garden and found an example mentioned in a useful article. After testing it is found that the webbrowser must be navigate before a document is generated, finally, we can implement the following multi-threaded operations. The final code is as follows:
Code

Private void threadwebbrowser (string URL)
{
Thread tread = new thread (New parameterizedthreadstart (begincatch ));
Tread. setapartmentstate (apartmentstate. Sta );
Tread. Start (URL );
}

Private void begincatch (Object OBJ)
{
String url = obj. tostring ();
Webbrowser WB = new webbrowser ();
WB. scripterrorssuppressed = true;
// Navigate a blank page here
WB. navigate ("about: blank ");
String htmlcode = gethtmlsource (URL );
WB. Document. Write (htmlcode );
// Perform the analysis ...... (Omitted)
}
// Obtain the web page source code from WebClient
Private string gethtmlsource (string URL)
{
String text1 = "";
Try
{
System. net. WebClient WC = new WebClient ();
Text1 = WC. downloadstring (URL );
}
Catch (exception exception1)
{}
Return text1;
}

Of course, when processing each node and database operation in the thread, the effect and performance of threadpool may be better. Finally, let's talk about some suggestions for improvement, to process a webpage containing Chinese characters, the gethtmlsource (string URL) given above)
The retrieved data may be garbled. I used webclent to download the data and set the appropriate character set without garbled characters,

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More