Capture the current webpage content from Firefox

Source: Internet
Author: User

Firefox uses the gecko kernel, which is different from the IE kernel. It cannot directly obtain ihtmldocument2 through hwnd as it processes the IE kernel. Fortunately, Mozilla made a ing for gecko so that gecko supports the msaa interface and can indirectly obtain ihtmldocument2 through hwnd (actually isimpledomdocument, which is also inherited from iunknown as ihtmldocument2 ).

There is an article on the Internet, "Research on Webpage Content acquisition and analysis based on IE and gecko kernels". Unfortunately, the method mentioned in this article is only valid for the earlier version of Firefox. After searching for a long time, I haven't found any Chinese documents about the new version. Many articles are aimed at firefox3.x, and they are not very reliable. However, you have to study the Mozilla official website documentation.

Mozilla official website article:

Supported msaa interfaces: describes how to map gecko to support msaa interfaces.

Msaa implementation features: describes how to obtain the webpage document information, but it is not all what we want. What we need to do is to access the webpage document information on the current tab through hwnd.

Find the window and load the document: tells us that Firefox will become a single-window application. The only way to find the top-level UI window and load the webpage document (isimpledomdocument) is to use "accessible relation navrelation_embeds ". This is true when I use Spy ++ to view the firefox9 window. There is only one window on the main interface, and the window class is mozillawindowclass. The revision date of this article is ", 4 Mar 2008". Therefore, this method has been used in earlier versions, but it is rarely mentioned in this article.

Msaa relations: defines the value of navrelation_embeds mentioned above.

Refer to the above three articles and some other references. You can obtain the following webpage documents:

1. Download The isimpledomnode. IDL, isimpledomtext. IDL, and isimpledomdocument. IDL files from the Mozilla Developer Center. Download Links are provided under msaa implementation features.

2. Use Microsoft's IDL compiler, midl, which is located in vc98 \ bin under the vc6 installation directory, to generate the header files on the Windows platform. The command format is as follows:
Midl isimpledomnode. IDL
Midl isimpledomtext. IDL
Midl isimpledomdocument. IDL

Then we get nine files, of which only six are needed: isimpledomnode. H, isimpledomnode_ I .c, isimpledomtext. H, others, isimpledomdocument. H, isimpledomdocument_ I .c

3. Assume that you have obtained the Firefox Window handle. The code for obtaining the webpage URL is as follows:

 

# Include <windows. h>
# Include <objbase. h>
# Include <atlbase. h>
# Include <oleacc. h>
# Include <comutil. h>

# Include "isimpledomnode. H"
# Include "isimpledomtext. H"
# Include "isimpledomdocument. H"

//////////////////////////////////////// ////////////////////////////////////////
//

// Navrelation_embeds must be defined by yourself. Its value is provided in msaa relations.
# Ifndef navrelation_embeds
# Define navrelation_embeds 0x1009
# Endif

//////////////////////////////////////// ////////////////////////////////////////
//
Bool geturl_gecko (hmodule holeaccdll, hwnd hwndbrowser, lptstr pszurl, size_t stbytes)
{
If (null = holeaccdll | null = hwndbrowser)
{
Return false;
}

Tchar szclassname [_ max_path] = {0 };
: Getclassname (hwndbrowser, szclassname, sizeof (szclassname ));
If (_ tcsncmp (szclassname, text ("mozillawindowclass"), sizeof (szclassname ))! = 0)
{
Return false;
}

Iaccessible * paccbrowser = NULL;

Lpfnaccessibleobjectfromwindow pfaccessibleobjectfromwindow
= (Lpfnaccessibleobjectfromwindow): getprocaddress (
Holeaccdll, "accessibleobjectfromwindow ");

If (null = pfaccessibleobjectfromwindow)
{
Return false;
}

Hresult hR = pfaccessibleobjectfromwindow (
Hwndbroble, objid_client, iid_iaccessible, (void **) & paccbrowser );
If (failed (HR) | null = paccbrowser)
{
Return false;
}

Variant vtstart;
Variant vtresult;

Vtstart. Vt = vt_i4;
Vtresult. lval = childid_self;

Paccbrowser-> accnavigate (navrelation_embeds, vtstart, & vtresult );

Idispatch * Pdisp = vtresult. pdispval;
If (null = Pdisp)
{
Return false;
}

Iaccessible * paccdoc = NULL;
HR = Pdisp-> QueryInterface (iid_iaccessible, (void **) & paccdoc );
If (failed (HR) | null = paccdoc)
{
Return false;
}

Iserviceprovider * pservprov = NULL;
HR = paccdoc-> QueryInterface (iid_iserviceprovider, (void **) & pservprov );
If (failed (HR) | null = pservprov)
{
Return false;
}

Const guid refguid = {0x0c5000090, 0x12e4, 0x11cf, 0xb6, 0x61,
0x00, 0xaa, 0x00, 0x4c, 0xd6, 0xd8 };

Isimpledomnode * pnode = NULL;
HR = pservprov-> queryservice (refguid, iid_isimpledomnode, (void **) & pnode );
If (failed (HR) | null = pnode)
{
Return false;
}

Isimpledomdocument * pdoc = NULL;
HR = pnode-> QueryInterface (iid_isimpledomdocument, (void **) & pdoc );
If (failed (HR) | null = pdoc)
{
Return false;
}

BSTR bstrurl = NULL;
HR = pdoc-> get_url (& bstrurl );
If (failed (HR) | null = bstrurl)
{
Return false;
}
_ Tcsncmp (pszurl, _ bstr_t (bstrurl), stbytes );

Return true;
}

//////////////////////////////////////// ////////////////////////////////////////
//
Bool getwebpageurl (hwnd hwndbrowser, lptstr pszurl, size_t stbytes)
{
If (! : Iswindow (hwndbrowindow ))
{
Return false;
}

: Coinitialize (null );

// Load msaa to check whether msaa is installed
Hmodule holeaccdll =: loadlibrary (text ("oleacc. dll "));
If (null = holeaccdll)
{
: Couninitialize ();
Return false;
}

Bool bsucceeded =: geturl_gecko (holeaccdll, hwndbroko, pszurl, stbytes );

: Freelibrary (holeaccdll );
: Couninitialize ();

Return bsucceeded;
}

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.