Use stream loading and saving HTML content & traversing html information to store an INI file & use MSHTML to analyze HTML code

Source: Internet
Author: User
Tags html to text blank page

Use stream loading and saving HTML content & traversing html information to store an INI file & use MSHTML to analyze HTML code

Use stream to load and save HTML content

Part of this article is translated from the MSDN Article Loading HTML content from a Stream

The IPersist * interface and its ancillary methods can be used by Microsoft Visual C ++ and WebBrowser controls to load and save HTML content using streams.

This article describes the steps required to load HTML content, which are divided into the following parts:

Go to about: blank
DHTML Object Model Validity
Use QueryInterface to obtain IPersist * interface
Use the IPersist * interface to load and save HTML content
Load and save HTML Element Data
Known issues
Reference
Related Topics
Go to about: blank
The IWebBrowser2: Navigate2 method of the IWebBrowser2 interface allows the browser to Navigate to a URL. In the following sample code, the IWebBrowser2: Navigate2 method is used to locate the about: blank page. the empty page is located to ensure that MSHTML is loaded and the HTML element in the dynamic HTML (DHTML) object model is valid.

This example shows how to locate an empty page in the browser. The m_pbroservariable contains the IWebBrowser2 interface pointer obtained from the WebBrowser control.

M_pbroe2-> Navigate2 (_ T ("about: blank"), NULL );
DHTML Object Model Validity
The DHTML Object model is used to access and manipulate HTML page content and is unavailable before page loading. Your application processes the DWebBrowserEvents2: DocumentComplete event of the WebBrowser control to determine whether a page is loaded. This event may be triggered by every framework on the page and triggered again when the top-level document is loaded. You can compare the IDispatch interface pointer passed by the event with the WebBrowser control to determine whether the DWebBrowserEvents2: DocumentComplete event is a top-level framework.

This WebBrowser DWebBrowserEvents2: DocumentComplete event example processing code demonstrates how to determine whether the event is a top-level framework (If yes), which indicates that the HTML page is loaded completely. this example also demonstrates how to create a stream from a memory block, which is a string containing the HTML content to be displayed.

Void myObject: DocumentComplete (LPDISPATCH pDisp, VARIANT * URL)
{
HRESULT hr;
IUnknown * pUnkBrowser = NULL;
IUnknown * pUnkDisp = NULL;
IStream * pStream = NULL;
HGLOBAL hHTMLText;
Static TCHAR szHTMLText [] = "// Is this DocumentComplete event a top-level framework window?
// Check the com id: Compare the IUnknown interface pointer.
Hr = m_pbrokno-> QueryInterface (IID_IUnknown, (void **) & pUnkBrowser );
If (SUCCEEDED (hr ))
{
Hr = pDisp-> QueryInterface (IID_IUnknown, (void **) & pUnkDisp );
If (SUCCEEDED (hr ))
{
If (pUnkBrowser = pUnkDisp)
{// This is the DocumentComplete event in the top-level framework window-page loading is complete!
// Create a stream containing HTML content
// In addition, the stream can be passed (rather than created)

Size_t = cchLength;
// TODO: safely determine the length of szHTMLText, in the unit of TCHAR.
HHTMLText = GlobalAlloc (GPTR, cchLength + 1 );

If (hHTMLText)
{
Size_t cchMax = 256;
StringCchCopy (TCHAR *) hHTMLText, cchMax + 1, szHTMLText );
// TODO: add the error handling code here.
Hr = CreateStreamOnHGlobal (hHTMLText, TRUE, & pStream );
If (SUCCEEDED (hr ))
{
// Call the Helper function to let the web browser load the stream.
LoadWebBrowserFromStream (m_pBrowser, pStream );
PStream-> Release ();
}
GlobalFree (hHTMLText );
}
}
PUnkDisp-> Release ();
}
PUnkBrowser-> Release ();
}
}
Use QueryInterface to obtain IPersis * and other interfaces
The IWebBrowser2: get_Document attribute of the WebBrowser control returns the document object that represents the DHTML Object Model of the top-level framework. MSHTML uses IPersistStreamInit and IPersistFile interfaces implemented by document objects and other HTML element objects, such as Frame and IFrame to provide the function of loading and saving HTML files using streams. The IDispatch interface of the object can be used to query corresponding interface pointers by using interfaces such as QueryInterface and IID_IPersistStreamInit, as described in the following code example.

HRESULT LoadWebBrowserFromStream (IWebBrowser2 * pWebBrowser, IStream * pStream)
{
HRESULT hr;
IDispatch * pHtmlDoc = NULL;
IPersistStreamInit * pPersistStreamInit = NULL;
// Return the Document Object.
Hr = pWebBrowser-> get_Document (& pHtmlDoc );
If (SUCCEEDED (hr ))
{
//> Query the IPersistStreamInit Interface
Hr = pHtmlDoc-> QueryInterface (IID_IPersistStreamInit, (void **) & pPersistStreamInit );
If (SUCCEEDED (hr ))
{
// Initialize the document.
Hr = pPersistStreamInit-> InitNew ();
If (SUCCEEDED (hr ))
{
// Load the stream content
Hr = pPersistStreamInit-> Load (pStream );
}
PPersistStreamInit-> Release ();
}
}
}
Use the IPersist * interface to load and save HTML content
The IPersistStreamInit interface has the InitNew and Load methods used to initialize and Load HTML content from the stream and the Save Method for saving. The InitNew method is initialized to a known state. The Load method loads HTML content from the stream, and the Save method saves HTML content to the stream.

The IPersistFile interface has the Load and Save methods used to Load and Save HTML content from disk files.

In the previous sample code, the HTML document is initialized and the HTML content is loaded from the stream.

Note that from Microsoft Internet Explorer 5, it is feasible to call the Load method of the IPersist * interface more than once. In earlier versions, each MSHTML instance only supports one Load call.
Load and save HTML Element Data
If HTML elements support IPersistStorage, IPersistStreamInit, or IPersistMemory, you can load and save information using similar code.

For how to load and save ActiveX Control Information on the webpage, refer to my article: access the properties of ActiveX control in the document through the HTML Document Object Model (CSDN document center) to obtain the control interface, and then query whether the ActiveX control supports the IPersist * interface.

Note: controls written in VB6.0 may not support these interfaces. In this case, you need to use IPersistPropertyBag or attribute set to load and save information. See Microsoft Knowledge Base Article Q272490 BUG: Visual Basic component error 0x800A02E0 "unable to save uninitialized classes"

Known issues
Microsoft Knowledge Base Article

In Microsoft Internet Explorer (Programming) 5.5, framework objects do not support IPersistStream, IPersistFile, and IPersistMemory interfaces.

Q323569 BUG: PersistStreamInit: Load () display HTML content as text

Q264868 BUG: Internet Explorer does not detect changes in content types from text/html to text/xml.

Reference
The following articles provide information about the Component Object Model (COM.

Inside OLE, 2nd Edition, by Kraig Brockschmidt (Microsoft Press)
Understanding ActiveX and OLE, by David Chappell (Microsoft Press)
Inside COM, by Dale Rogerson (Microsoft Press)
Microsoft Knowledge Base Article

Q223337 information: use the Internet Explorer XML parser to load/save XML data
Q196340: how to obtain the WebBrowser object model of the HTML framework
Related Topics

Microsoft Visual Studio
The Component Object Model Specification

====================================
Store html information in an INI File

BOOL SaveFormData (CHtmlView * pView, CString DataFileName, CString SectionName)
{
IHTMLDocument2 * pDoc = NULL;
IHTMLElementCollection * pAllElem = NULL;
IHTMLElement * pElem = NULL;
IHTMLTextAreaElement * pTextArea = NULL;
IHTMLSelectElement * pSelect = NULL;
IHTMLInputElement * pInput = NULL;
BOOL Result = TRUE;
PDoc = (IHTMLDocument2 *) (pView-> GetHtmlDocument ());
If (pDoc! = NULL)
{
PDoc-> get_all (& pAllElem );
If (pAllElem! = NULL)
{
Long EleCount;
PAllElem-> get_length (& EleCount );
VARIANT vEleName;
BSTR bValue;
For (int I = 0; I {
VEleName. vt = VT_I4;
VEleName. lVal = I;
If (pAllElem-> item (vEleName, vEleName, (LPDISPATCH *) & pElem) = S_ OK)
{
If (pElem! = NULL)
{
PElem-> QueryInterface (& pInput );
If (pInput! = NULL)
{
BSTR bType;
PInput-> get_type (& bType );
CString Type (bType );
SysFreeString (bType );
If (Type = "text" | Type = "hidden ")
{
PInput-> get_name (& bValue );
CString name (bValue );
SysFreeString (bValue );
PInput-> get_value (& bValue );
CString value (bValue );
SysFreeString (bValue );
WritePrivateProfileString (SectionName, name, value, DataFileName );
} // If (Type = "text"
Else if (Type = "checkbox ")
{
PInput-> get_name (& bValue );
CString name (bValue );
SysFreeString (bValue );
VARIANT_BOOL Check;
PInput-> get_checked (& Check );
If (Check)
{
PInput-> get_value (& bValue );
CString value (bValue );
SysFreeString (bValue );
WritePrivateProfileString (SectionName, name, value, DataFileName );
}
Else
WritePrivateProfileString (SectionName, name, "0", DataFileName );
}
PInput-> Release ();
} // If (pInput! = NULL)
PElem-> QueryInterface (& pTextArea );
If (pTextArea! = NULL)
{
PTextArea-> get_name (& bValue );
CString name (bValue );
SysFreeString (bValue );
PTextArea-> get_value (& bValue );
CString value (bValue );
SysFreeString (bValue );
WritePrivateProfileString (SectionName, name, value, DataFileName );
PTextArea-> Release ();
} // If (pTextArea! = NULL)
PElem-> QueryInterface (& pSelect );
If (pSelect! = NULL)
{
PSelect-> get_name (& bValue );
CString name (bValue );
SysFreeString (bValue );
PSelect-> get_value (& bValue );
CString value (bValue );
SysFreeString (bValue );
WritePrivateProfileString (SectionName, name, value, DataFileName );
PSelect-> Release ();
} // If (pSelect! = NULL)
PElem-> Release ();
} // If (pElem! = NULL)
} // PAllElem-> item (
} //
PAllElem-> Release ();
} // If (pAllElem! = NULL)
PDoc-> Release ();
} // If (pDoc! = NULL)
Return Result;
}
 

======================================
Use MSHTML to analyze HTML code

Author: Asher Kobin

Environment: Windows 2000/Windows ME/IE 5.0 +

I have a lot of experience using MSHTML in the program, and often involves how to use MSHTML to analyze HTML code and access it Through DOM.

Here is an example. I use the IMarkupServices interface provided by MSHTML. IOleClientSite or any other embedded operations are not required. It is as simple as possible to get what you need.

In future articles, I will focus on using other MSHTML methods in the program, such as using MSHTML as an editor.

This Code only uses simple COM calls. It can be easily used in ATL, MFC, VB and other languages. Do not ask me for examples in other languages. If you need to do so, you need ie sdk.

/*************************************** ***************************
* ParseHTML. cpp
*
* ParseHTML: Lightweight UI-less HTML parser using MSHTML
*
* Note: This is for accessing the DOM only. No image download,
* Script execution, etc...
*
* 8 June 2001-Asher Kobin (asherk@pobox.com)
*
* This code and information is provided "as is" WITHOUT WARRANTY
* Of any kind, either expressed or implied, INCLUDING BUT NOT
* Limited to the implied warranties of merchantability and/OR
* Fitness for a particle PURPOSE.
*
**************************************** ***************************/

# Include
# Include

OLECHAR szHTML [] = OLESTR ("Hello World! ");

Int _ stdcall WinMain (HINSTANCE hInst,
HINSTANCE hPrev,
LPSTR lpCmdLine,
Int nShowCmd)
{
IHTMLDocument2 * pDoc = NULL;

CoInitialize (NULL );

CoCreateInstance (CLSID_HTMLDocument,
NULL,
CLSCTX_INPROC_SERVER,
IID_IHTMLDocument2,
(LPVOID *) & pDoc );

If (pDoc)
{
IPersistStreamInit * pPersist = NULL;

PDoc-> QueryInterface (IID_IPersistStreamInit,
(LPVOID *) & pPersist );

If (pPersist)
{
IMarkupServices * pMS = NULL;

PPersist-> InitNew ();
PPersist-> Release ();

PDoc-> QueryInterface (IID_IMarkupServices,
(LPVOID *) & pMS );

If (pMS)
{
IMarkupContainer * pMC = NULL;
IMarkupPointer * pMkStart = NULL;
IMarkupPointer * pMkFinish = NULL;

PMS-> CreateMarkupPointer (& pMkStart );
PMS-> CreateMarkupPointer (& pMkFinish );

PMS-> ParseString (szHTML,
0,
& PMC,
PMkStart,
PMkFinish );

If (pMC)
{
IHTMLDocument2 * pNewDoc = NULL;

PMC-> QueryInterface (IID_IHTMLDocument,
(LPVOID *) & pNewDoc );

If (pNewDoc)
{
// Do anything with pNewDoc, in this case
// Get the body innerText.

IHTMLElement * pBody;
PNewDoc-gt; get_body (& pBody );

If (pBody)
{
BSTR strText;

PBody-> get_innerText (& strText );
PBody-> Release ();

SysFreeString (strText );
}

PNewDoc-> Release ();
}

PMC-> Release ();
}

If (pMkStart)
PMkStart-> Release ();

If (pMkFinish)
PMkFinish-> Release ();

PMS-> Release ();
}
}

PDoc-> Release ();
}

CoUninitialize ();

Return TRUE;
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.