Typically, the document load completion event DocumentCompleted is judged by WebBrowser
if (_webbrowder.readystate = = webbrowserreadystate.complete) {// Fetch Web information and process }
Unfortunately, many Web pages are quite complex, and sometimes debugging can see that _webbrowder.readystate state information may have been Webbrowserreadystate.interactive state, but the relevant data in the Web page has been loaded or not loaded data, or card, and so on, and so on, in order to be able to improve the efficiency of data collection, it is necessary to consider the time-out situation, need to manually get the relevant HTML to determine whether the document has loaded my The data we want, if not loaded within the limited time to skip, in order to achieve the effect of the study for a long time has not solved, and later saw an article http://www.cnblogs.com/wangchuang/p/3618883.html, Through the inside of the class to improve to achieve their desired effect, now put their own code out, have encountered similar shoes can refer to the following
/// <summary> ///crawl Web data through WebBrowser///Webbrowsercrawler webbrowsercrawler=new Webbrowsercrawler (); ///Example: File.writealltext (Server.MapPath ("Sample.txt"), Webbrowsercrawler.getreult (http://www.in2.cc/sample/waterfalllab.htm)); /// </summary> Public classWebbrowsercrawler {//WebBrowser PrivateWebBrowser _webbrowder; //the final results Private string_result {Get;Set; } //Web site Private string_path {Get;Set; } //when the data is being crawled, the maximum number of seconds allowed to wait, time-out (in seconds) Private int_maxwaitseconds {Get;Set; } Public Delegate BOOLMyDelegate (Objectsender, Testeventargs e); /// <summary> ///whether to reach the stop load condition/// </summary> Public EventMyDelegate isstopevent; /// <summary> ///the method for Grandpa/// </summary> /// <param name= "url" >URL Path</param> /// <param name= "Maxwaitseconds" >Maximum wait seconds</param> /// <returns></returns> Public stringGetreult (stringUrlintMaxwaitseconds = -) {_path=URL; _maxwaitseconds= Maxwaitseconds <=0? -: maxwaitseconds; varMthread =NewThread (Fatchdatatoresult); //Apartment is a logically container that allows objects to share the same threading access requirements as they do in the order. All objects within the same Apartment can receive any of the Apartment//the. NET Framework does not use apartment;managed objects must be used in a secure Way (Thread-safe)//because COM classes use Apartment, Common Language Runtime calls out COM objects in the context of COM Interop, and it is necessary to establish Apartment and start//Managed can be built and entered into a single, Apartment (STA) that only allows one to perform, or a multi-threading Apartment (MT) with more than one threading//as long as the apartmentstate of the executive is set to one of the ApartmentState (enumeration), you can control which of the Apartment is established.//because a particular executive can only initialize a COM Apartment at a time, the first call to unmanaged code does not change any more Apartment//From :http://msdn.microsoft.com/zh-tw/library/system.threading.apartmentstate.mthread.setapartmentstate (ApartmentState.STA); Mthread.start (); Mthread.join (); return_result; } /// <summary> ///Call _webbrowder fetching information///For thread call/// </summary> Private voidFatchdatatoresult () {_webbrowder=NewWebBrowser (); _webbrowder.scripterrorssuppressed=true; _webbrowder.navigate (_path); DateTime Firsttime=DateTime.Now; //Handle all Windows currently in the Message NIN column//If you call DoEvents in the code, your application can handle other events. For example, if your form adds information to the ListBox and adds DoEvents to the code, when another window is dragged onto your list, the table will//If you remove DoEvents from the code, your form will not be re-drawn until the button-click event handler while((Datetime.now-firsttime). TotalSeconds <=_maxwaitseconds) { if(_webbrowder.document! =NULL&& _webbrowder.document.body! =NULL&&!string. IsNullOrEmpty (_webbrowder.document.body.outerhtml) && This. Isstopevent! =NULL) { stringHTML =_webbrowder.document.body.outerhtml; BOOLrs = This. Isstopevent (NULL,NewTesteventargs (HTML)); if(RS) { This. _result =html; Break; }} application.doevents (); } _webbrowder.dispose (); } }
How to use:
Webbrowsercrawler obj =NewWebbrowsercrawler (); Obj. Isstopevent+=NewWebbrowsercrawler.mydelegate (sender, e) = = { //The data i want is already loaded in the current HTML and returns True returnE.html.contains ("AIDFDSFSDF"); }); stringURL ="http://www.xxx.cn/aaa/index.html?keyword=sdfded"; stringhtml = obj. Getreult (URL);//get the collected data if(!string. IsNullOrEmpty (HTML)) {//working with Data}
Get Ajax pages via WebBrowser