Get Ajax pages via WebBrowser

Source: Internet
Author: User

Typically, the document load completion event DocumentCompleted is judged by WebBrowser

if (_webbrowder.readystate = = webbrowserreadystate.complete)        {// Fetch Web information and process }

Unfortunately, many Web pages are quite complex, and sometimes debugging can see that _webbrowder.readystate state information may have been Webbrowserreadystate.interactive state, but the relevant data in the Web page has been loaded or not loaded data, or card, and so on, and so on, in order to be able to improve the efficiency of data collection, it is necessary to consider the time-out situation, need to manually get the relevant HTML to determine whether the document has loaded my The data we want, if not loaded within the limited time to skip, in order to achieve the effect of the study for a long time has not solved, and later saw an article http://www.cnblogs.com/wangchuang/p/3618883.html, Through the inside of the class to improve to achieve their desired effect, now put their own code out, have encountered similar shoes can refer to the following

    /// <summary>    ///crawl Web data through WebBrowser///Webbrowsercrawler webbrowsercrawler=new Webbrowsercrawler (); ///Example: File.writealltext (Server.MapPath ("Sample.txt"), Webbrowsercrawler.getreult (http://www.in2.cc/sample/waterfalllab.htm)); /// </summary>     Public classWebbrowsercrawler {//WebBrowser        PrivateWebBrowser _webbrowder; //the final results        Private string_result {Get;Set; } //Web site        Private string_path {Get;Set; } //when the data is being crawled, the maximum number of seconds allowed to wait, time-out (in seconds)        Private int_maxwaitseconds {Get;Set; }  Public Delegate BOOLMyDelegate (Objectsender, Testeventargs e); /// <summary>        ///whether to reach the stop load condition/// </summary>         Public EventMyDelegate isstopevent; /// <summary>        ///the method for Grandpa/// </summary>        /// <param name= "url" >URL Path</param>        /// <param name= "Maxwaitseconds" >Maximum wait seconds</param>        /// <returns></returns>         Public stringGetreult (stringUrlintMaxwaitseconds = -) {_path=URL; _maxwaitseconds= Maxwaitseconds <=0? -: maxwaitseconds; varMthread =NewThread (Fatchdatatoresult); //Apartment is a logically container that allows objects to share the same threading access requirements as they do in the order. All objects within the same Apartment can receive any of the Apartment//the. NET Framework does not use apartment;managed objects must be used in a secure Way (Thread-safe)//because COM classes use Apartment, Common Language Runtime calls out COM objects in the context of COM Interop, and it is necessary to establish Apartment and start//Managed can be built and entered into a single, Apartment (STA) that only allows one to perform, or a multi-threading Apartment (MT) with more than one threading//as long as the apartmentstate of the executive is set to one of the ApartmentState (enumeration), you can control which of the Apartment is established.//because a particular executive can only initialize a COM Apartment at a time, the first call to unmanaged code does not change any more Apartment//From :http://msdn.microsoft.com/zh-tw/library/system.threading.apartmentstate.mthread.setapartmentstate (ApartmentState.STA);            Mthread.start ();            Mthread.join (); return_result; }        /// <summary>        ///Call _webbrowder fetching information///For thread call/// </summary>        Private voidFatchdatatoresult () {_webbrowder=NewWebBrowser (); _webbrowder.scripterrorssuppressed=true;            _webbrowder.navigate (_path); DateTime Firsttime=DateTime.Now; //Handle all Windows currently in the Message NIN column//If you call DoEvents in the code, your application can handle other events. For example, if your form adds information to the ListBox and adds DoEvents to the code, when another window is dragged onto your list, the table will//If you remove DoEvents from the code, your form will not be re-drawn until the button-click event handler             while((Datetime.now-firsttime). TotalSeconds <=_maxwaitseconds) {                if(_webbrowder.document! =NULL&& _webbrowder.document.body! =NULL&&!string. IsNullOrEmpty (_webbrowder.document.body.outerhtml) && This. Isstopevent! =NULL)                {                    stringHTML =_webbrowder.document.body.outerhtml; BOOLrs = This. Isstopevent (NULL,NewTesteventargs (HTML)); if(RS) { This. _result =html;  Break;            }} application.doevents ();        } _webbrowder.dispose (); }    }

How to use:

Webbrowsercrawler obj =NewWebbrowsercrawler (); Obj. Isstopevent+=NewWebbrowsercrawler.mydelegate (sender, e) = = {                //The data i want is already loaded in the current HTML and returns True                returnE.html.contains ("AIDFDSFSDF");            }); stringURL ="http://www.xxx.cn/aaa/index.html?keyword=sdfded"; stringhtml = obj. Getreult (URL);//get the collected data            if(!string. IsNullOrEmpty (HTML)) {//working with Data}

Get Ajax pages via WebBrowser

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.