HTML code Filtering Technology

Source: Internet
Author: User

 

Reference: msdn pluggable protocols Overview

Reference example: Provided by msdn

Http://support.microsoft.com/default.aspx? SCID = KB; en-US; q260840 # appliesto

Another example is written in Delphi: http://www.guicode.com/scr/mimefilter.zip

To implement HTML code filtering, you must register one or more mime filters (Pluggable mime filter ).

The mime filter is a COM Object and must implement the iinternetprotocolsink and iinternetprotocol interfaces. Mime filters can be registered as temporary or permanent. If multiple temporary mime filters are registered at the same time, the registered objects will be called first!

How to register a mimeFilter? To registerPermanentMime filter, you must be in the Registry

Add a sub-key under the hkey_classes_root "protocols" filter key. The sub-key name is

Registered MIME type

The added sub-Key must have a string value named clsid. The value is yours.

The clsid of the provided COM object. The default value of this key can be a simple description of your object. If you use ATL

You can add the following content to the object's RGS file:

Hkcr

{

Noremove protocols

{

Noremove Filter

{

Forceremove 'text/html '= S' xmlmimefilter mime filter sample'

{

Val CLSID = S' {53b95211-7d77-11d2-9f80-00366b366c96 }'

}

}

}

}

The above Code comes from the example mentioned at the beginning of the article. 'Xmlmimefilter mime filter sample' and

{53b95211-7d77-11d2-9f80-00108b0000c96} 'must be replaced with your own!

If you want to registerTemporaryTo use the iinternetsession interface.

The following code registers a temporary filter:

Ccomptr <iinternetsession> m_spsession;

Ccomptr <iclassfactory> m_spclassfactorymime;

HR =: cogetclassobject (clsid_mimefilter, clsctx_server,

Null, iid_iclassfactory,

(Void **) & m_spclassfactorymime );

If (hR = s_ OK)

{

If (: cointernetgetsession (0, & m_spsession, 0) = s_ OK)

{

M_spsession-> registermimefilter (m_spclassfactorymime,

Clsid_mimefilter, l "text/html ");

}

}

The clsid_mimefilter here is the CLSID of your object.

 MimeTypeThere are many types of information. For more information, see the msdn appendix.

MIME type detection in Internet Explorer 4.0, but the actual type is much more than the column here.

To learn about the MIME type registered on your computer, you can view the [hkey_classes_root "mime" in the registry"

Database "content type] key. You can also call the findmimefromdata function to obtain the corresponding mime of the object.

Type. The following code shows the MIME type of the JS file:

Lpwstr pwzmimeout;

Findmimefromdata (null, l "time. js", 0, 0, 0, 0, & pwzmimeout, 0 );

The MIME type is application/X-JavaScript.

To filter hmtl pages, you can register the text/html type. You can also

You can call registermimefilter to register multiple MIME filters.

After registering a temporary or permanent mime filter, the next step is to implement the mime filter object.

Before implementation, let's take a look at the mime filters and Web

Description of the call between the transaction handler (urlmon. dll) interfaces (Note: The iinternetprotocol and iinternetprotocolsink interfaces are implemented in urlmon. dll ):

1. The web processor calls the iinternetprotocolroot: Start method of the mime filter (iinternetprotocol

Derived from iinternetprotocolroot );

2. The web processor successively calls the iinternetprotocolsink: reportprogress and

Iinternetprotocolsink: reportdata method;

3. The mime filter calls the iinternetprotocol: Read method of the Web processor;

4. The mime filter calls the iinternetprotocolsink: reportdata method of the Web processor;

5. The web processor calls the iinternetprotoco: Read method of the mime filter;

Therefore, there are several important methods to implement the mime filter:

1. iinternetprotocolroot: Start method:

Hresult start (

[In] lpcwstr szurl,

[In] iinternetprotocolsink * poiprotsink,

[In] iinternetbindinfo * poibindinfo,

[In] DWORD grfpi,

[In] DWORD dwreserved

);

As a mime filter object, szurl imports the MIME type (if it is a name space handlers object,

This parameter is a URL to be downloaded or parsed ). If you want a URL, you can use poibindinfo to connect

The following is an example:

Lpolestr pwzurl;

Ulong uelfetched;

Pibindinfo-> getbindstring (bindstring_url, & pwzurl, 1, & uelfetched );

Poiprotsink is the iinternetprotocolsink interface provided by urlmon. dll. Because the interface needs to be called during subsequent processing, it must be saved;

Grfpi is an enumeration variable and must contain the pi_filter_mode flag, indicating that the object runs in filter mode.

Dwreserved is a pointer to the protocolfilterdata structure. The pProtocol member of this structure is urlmon. iinternetprotocol interface provided by DLL, because it needs to be called in the subsequent processing process, so it should be saved. In fact, this interface can also be obtained by calling QueryInterface through the poiprotsink parameter. Similarly, pprotocolsink and poiprotsink in the protocolfilterdata structure both point to the same interface.

In the start method, all we need to do is to save the iinternetprotocolsink provided by urlmon. dll.

And iinternetprotocol interfaces.

2. iinternetprotocolsink: reportprogress method:

HRESULT ReportProgress(

    [in] ULONG ulStatusCode,

 

    [in] LPCWSTR szStatusText )

As a mime filter, ulstatuscode is usually bindstatus_cachefilenameavailable. When

When the ulstatuscode is bindstatus_cachefilenameavailable, szstatustext is a temporary cache file

But some Web pages are not written to the cache, so szstatustext may be a null string.

3. iinternetprotocolsink: reportdata method:

Hresult reportdata (

    [in] DWORD grfBSCF,
    [in] ULONG ulProgress,
    [in] ULONG ulProgressMax
);

IE will call the reportdata method of the mime Filter during or after the download is completed, ulprogressmax

 

It indicates the total data volume of the file, and ulprogress indicates the download progress. Theoretically, after all the files are downloaded, ulprogress should

Equal to ulprogressmax (in fact, when the webpage file is not very large, even if ulprogress is not equal to ulprogressmax, all files may be downloaded), there is another parameter that reflects the File Download situation: grfbscf. Sometimes, the reportdata method is called multiple times by the web processor.

Reportdata is suitable for filtering webpage content or modifying webpage content. Here, you can save the webpage content to your own cache or stream by calling read and perform proper processing (check the character encoding ).

Finally, do not forget to call the iinternetprotocolsink: reportdata method of the Web processor to report data download information to it. After the Web processor receives this notification, it will call the iinternetprotocol: read of the mime filter. At this time, you can submit the modified data to the Web processor.

The following code example shows how to call the Web processor's read to pre-save data in reportdata:

 

// M_spincomingprot is the iinternetprotocol interface of the Web processor saved in start.

// M_spstm is the istream pointer used to cache data.

 

Byte buffer [size_buffer];

DWORD cbread;

Do

{

Cbread = 0;

HR = m_spincomingprot-> Read (buffer, size_buffer, & cbread );

If (cbread> 0) **

{

If (m_spstm-> write (buffer, cbread, null) = s_ OK)

{

M_cbtotal + = cbread;

}

}

} While (hR = s_ OK );

Generally, only s_ OK or s_false is returned when data is successfully obtained through read. If s_ OK is returned, data is still available, whereas s_false is returned.

Indicates that the data has been read, so the cycle condition is set to HR = s_ OK. Then the condition at location a determines why it is not

If (hR = s_ OK | hR = s_false), because in some cases, read may return

But some data is still successfully read. The data size is the value specified by cbread. If

If that part of the data is left blank, the webpage cannot be parsed normally! Will this cause some useless errors due to read failure?

Save data to the cache? At least not yet.

4. iinternetprotocol: Read Method

This method is called by the web processor to obtain the data to be parsed by the browser. In the previous method reportdata

We have cached all the data in the stream, so here we only need to return the data in the stream to the Web processor.

The following code demonstrates a simple process in Read:

If (m_spstm-> Read (PV, CB, pcbread) = s_ OK)

{

If (* pcbread = CB)

{

Return s_ OK;

}

Else

Return s_false;

}

Note that s_false must be returned when the data has been read. Otherwise, the read may be called cyclically.

After processing these methods, it is basically caused by great efforts. Other methods are very simple to handle. You can refer to the above

Example.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.