When we were doing collection software
It is very troublesome for some websites to analyze html text directly.
When WinForm is used for programming
There is a better way, of course, to analyze HtmlDocument
However, this HtmlDoucment cannot be created directly.
It must be generated by the WebBroswer control Navigate after a page
To obtain wb. HtmlDocument
Then, you can analyze the elements and labels of HtmlDocument.
In fact
Not only a single page is collected
In this way, it can be completed in the main form
For example, to collect some list pages, there are N multiple pages
So, a loop goes down,
Using WebBrowser to respond will lead to false positives
At this time, we will certainly think of using multithreading to do this.
C # multithreading,
We should all know that there are two modes: STA and MTA.
However, the WebBrowser control has a bad feature.
That is: it only supports multi-thread STA Mode
For example, the following code,
Thread tread = new Thread (new ParameterizedThreadStart (BeginCatch ));
Tread. SetApartmentState (ApartmentState. STA );
Tread. Start (url );
Code Private void BeginCatch (object obj)
{
String url = obj. ToString ();
WebBrowser wb = new WebBrowser ();
Wb. ScriptErrorsSuppressed = true;
Wb. Navigate (url );
Wb. DocumentCompleted + = new WebBrowserDocumentCompletedEventHandler (wb_DocumentCompleted );
}
To analyze HtmlDocument generated by WebBrowser, you must perform operations in the event DocumentCompleted.
WebBrowser is loaded only at this time.
However, this is just a trap !!!!
WebBrowser has a feature, that is, when the multi-thread STA is used, it simply does not wait for the execution of DocumentCompleted.
That is, subsequent operations cannot be performed !!!
In this case, what should we do?
Someone may think of the wb. Document. Write (string) method as follows:
Code Private void BeginCatch (object obj)
{
String url = obj. ToString ();
WebBrowser wb = new WebBrowser ();
Wb. ScriptErrorsSuppressed = true; string htmlcode = GetHtmlSource (url );
Wb. Document. Write (htmlcode );
// Perform the analysis operation
}
// Obtain the web page source code from WebClient
Private string GetHtmlSource (string Url)
{
String text1 = "";
Try
{
System. Net. WebClient wc = new WebClient ();
Text1 = wc. DownloadString (Url );
}
Catch (Exception exception1)
{}
R