---- 1. Introduction
---- How to perform operations on objects in IE browser is very practical. Through DLL bound to IE, we can record the order of Web Pages browsed by IE, analyzes users' usage behaviors and modes. We can filter and translate the content of the webpage, and automatically fill in the Form Content that users often need to fill in on the webpage. All our example code is represented by VC, the principle is to access ie through interaction with the interface of the IE Object. We actually use the COM technology. We know that com is a language-independent binary object interaction mode. Therefore, the content described below can be implemented in other languages, such as VB, Delphi, and C ++ builder.
---- 2. Implement IE instance Traversal
---- First, let's look at how the system knows how many ie instances are running.
---- In the Windows architecture, an application can interact with the instances of these applications through the operating system running object table. However, the current implementation mechanism of IE is not to register in the running object table, so other methods are required. We know that the shellwindows set can be used to represent the set of opened windows of the shell, While IE is an application of the shell.
---- The following describes how to use VC to traverse the current IE instance. Ishellwindows is an interface for the System Shell. we can define the following interface variable:
Shdocvw: ishellwindowsptr m_spshwinds;
Then create an instance with the variable:
M_spshwinds.createinstance
(_ Uuidof (shdocvw: shellwindows ));
Getcount through the ishellwindows Interface
You can get the number of current instances:
Long ncount = m_spshwinds-> getcount ();
Method item through the ishellwindows Interface
Each instance object can be obtained.
Idispatchptr spdisp;
_ Variant_t VA (I, vt_i4 );
Spdisp = m_spshwinds-> item (VA );
Then we can determine whether the instance object is
It is an IE browser object and is implemented using the following statement:
Shdocvw: iwebbrowser2ptr spbrowser (spdisp );
Assert (spbrowser! = NULL)
---- After obtaining the IE browser object, we can call the iwebbrowser2ptr interface to obtain the pointer of the current document object: mshtml: ihtmldocument2ptr spdoc (spbrowser-> getdocument ());
---- Then we can use this interface to operate this document object, for example, get the document title through gettitle.
---- When we are browsing the network, we usually open many Internet Explorer instances at the same time. If these pages are good, we may want to save them on the hard disk, we need to save each instance. If we use the above principle, we can get the interface of each ie instance and Its webpage object, in this way, you can use a simple program to batch save all the currently opened web pages. The above method is used to traverse the current IE instance, but we want to get the events generated by each ie instance, which needs to be implemented through the DLL mechanism.
---- 3. Implementation of DLL bound to IE
---- We will introduce how to build a DLL bound to IE. To bind an IE running instance, we need to create a DLL that can be bound to each ie instance. The Startup Process of IE is like this. When an IE instance is started, it searches for a CLSID in the registry. The key location of the Registry is as follows:
Hkey_locall_machine/software/Microsoft/Windows
/CurrentVersion/Explorer/Browser Helper Objects
---- When clsids exists under the key position, ie will create an instance for each object listed in the key position by using the cocreateinstance () method. Note that the clsids of the object must be represented in the form of a subkey rather than a name value, for example, {DD41D66E-CE4F-11D2-8DA9-00A0249EABF4} is a valid subkey. The reason why we use DLL instead of EXE is that DLL and IE instances run in the same process space. Each DLL in this form must implement the iobjectwithsite interface, and the setsite method must be implemented. Through this method, we can obtain an iunknown pointer to the ie com object through our own DLL, in fact, through this pointer, we can use the QueryInterface method in the COM object to traverse all the available interfaces. This is the basic mechanism of COM. Of course, all we need is the iwebbrowser2 interface.
---- In fact, we create a COM Object. dll is just a form of COM object. The following methods are required to create and implement the COM Object:
---- 1. The method setsite of ioleobjectwithsite interface must be implemented. In fact, the IE instance uses this method to pass an interface pointer to our COM object. Assume that we have an interface pointer variable, which can be set:
---- Ccomqiptr <iwebbrowser2, & iid_iwebbrowser2> m_mywebbrowser2;
---- We can assign the passed interface pointer to m_mywebbrowser2 in the setsite method. 2. After we get the interface pointing to the ie com object, we need to connect our DLL to the event of the IE instance. To achieve this, we need to introduce two interfaces:
---- (1) iconnectionpointcontainer. The purpose of using this interface is to establish a specific connection with the DLL Based on the IID it obtains. For example, we can define the following:
Ccomqiptr <iconnectionpointcontainer,
& Iid_iconnectionpointcontainer>
Spcpcontainer (m_mywebbrowser2 );
---- Then, we need to communicate all the events in IE with our DLL. You can use iconnectpoint.
-- (2) iconnectpoint. Through this interface, the customer can start or terminate an advisory loop on the connected object. Iconnectpoint has two main methods: advice and unadvise. For our application, advise is used to establish a channel between every ie event and DLL. Unadvise is used to terminate the previously established notification relationship with advise. For example, we can define the iconnectpoint interface as follows: ccomptr <iconnectionpoint> spconnectionpoint;
---- Then, we can use the following method to associate all the events in the IE instance with our dll:
HR = spcpcontainer-> findconnectionpoint (
Diid_dwebbrowserevents2, & spconnectionpoint );
---- Then we use the advice method of the iconnectpoint interface to let our DLL know whenever a new event occurs in IE. You can use the following statement:
HR = spconnectionpoint-> advise (
(Idispatch *) This, & m_dwidcode );
---- After establishing a connection between the events in the IE instance and our DLL, we can use the invoke () method of the idispatch interface to process all the IE events.
-- 3. Invoke () method of the idispatch interface. Idispatch is an interface type inherited from iunknown. Any service provided through the COM interface can be implemented through the idispatch interface. Idispatch: invoke works in a similar way as vtbl does behind the scenes. Invoke implements a set of functions accessed by indexes, we can dynamically customize the invoke method to provide different services. The invoke method is as follows:
Stdmethod (invoke) (dispid dispidmember, refiid
Riid, lcid, word wflags,
Dispparams * pdispparams, variant * pvarresult,
Raise info * p1_info, uint * puargerr );
-- Where dispid is a long integer that identifies a function. For a specific implementation of idispatch, dispid is unique. Each implementation of idispatch has its own IID. Here dispidmemeber can be considered as a method related to every event of the IE instance, such as dispid_beforenavigate2 and dispid_navigatecomplete2. Another important parameter in this method is dispparams. Its structure is as follows:
Typedef struct tagdispparams
{
Variantarg * rgvarg;
// Variantarg is the same as varaiant.
// Found in IDL. IDL. So rgvarg is actually a parameter number.
// Group
Dispid * rgdispidnameargs; // The dispid of the naming Parameter
Unsigned int cargs; // indicates the number of elements in the array.
Unsigned int cnameargs; // Number of named Elements
} Dispparams
-- Note that each parameter type is variantarg, so the number of parameter types that can be passed between IE and our DLL is limited. Only types that can be put in the variantarg structure can be passed through the scheduling interface. For example, for event dispid_navigatecomplete2, the first parameter indicates the URL value accessed by IE. The type is vt_byref | vt_variant. Note that dispid such as dispid_navigatecomplete2 has been defined in VC and can be used directly. As mentioned above, we can get all the events of IE instances in method invoke. We can put the data in the file for post-event analysis, it can also be displayed in a list box in real time.
---- 4. Microsoft's HTML Document Object Model and Application Analysis
---- Let's take a look at how to get the webpage Document Interface: The webpage document interface is ihtmldocument2. You can call the get_document method of the ie com object to get the webpage interface. Use the following statement:
HR = m_spwebbrowser2-> get_document (& spdisp );
Ccomqiptr <ihtmldocument2,
& Iid_ihtmldocument2> sphtml;
Sphtml = spdisp;
---- In this way, we get the interface of the webpage object, and then we can analyze the webpage, for example, through the method get_url provided by ihtmldocument2, we can get the URL value of the URL related to the webpage. Through the get_forms method, we can set all form objects on the webpage. In fact, the W3C organization has developed a document objec model standard. Of course, this standard is not only for HTML, but also for XML. W3C only defines the interface of webpage objects. Different companies can use different languages and methods for specific implementation. Webpage objects defined by W3C organizations are considered dynamic, that is, users can dynamically operate on each object contained in the webpage objects. The object here can be an input box or an object such as an image or sound. At the same time, according to the instructions in W3C official documents, webpage objects can be dynamically added and deleted. In fact, few vendors have implemented all the functions defined by Dom. Microsoft's definition of webpage objects is also basically implemented according to this standard. However, the current interface does not support dynamic addition or deletion of elements, but you can modify the basic elements in the webpage. For example, ihtmlelementcollection indicates a set of basic elements in a webpage, and ihtmlelement indicates a basic element in a webpage. The ihtmloptionelement interface indicates a specific element option. The basic elements include the setattribute and geattribute methods to set the dynamic state and obtain the element name and value.
---- A common application is that we can analyze whether forms need to be filled in the webpage. If the forms of this website have been filled in before and the data has been saved, we can automatically place the data in the location related to forms under the URL. In addition, we can summarize the form data items to be filled in on the web page, assign values to these data items first, and then automatically fill in the assigned values when there are similar data items. In fact, form is an object. Elements in form, such as input, option, and select, are all objects.
---- Another application that can be thought of is to automatically translate the text on the webpage, because we can modify the attributes of any object on the webpage, therefore, we can automatically translate part of the language that is not in our country at the cost of native language. Of course, the real implementation depends on breakthroughs in natural language understanding technologies, however, the interface and object form of IE allows us to flexibly control the entire IE, from the event object to the webpage object.
---- 5. Summary
---- We have analyzed how to obtain all the IE instances, introduced the detailed implementation mechanism of the DLL bound to the IE instance, and analyzed the object of the web page. Several related applications, implementation methods, and technical problems are introduced. IE is a componentized Browser Based on COM. It has powerful functions and leaves a vast space for application developers. Of course, it also has a large volume, slow speed. However, its architecture represents Microsoft's advanced and innovative technologies, so it has a strong vitality.