Parse HTML Safely

Source: Internet
Author: User


Jquery.parsehtml

Given a piece of HTML code, how do you convert it to a DOM tree for processing?

If used jQuery , you can use its $.parsehtml method to convert HTML code into a DOM tree.

var markup = ' <p> ' +        '  ' +        ' <script type=" text/javascript "> ' +            ' document.onclick=function () {Console.log (" click ") ;};' +            ' window.alert ("Hello"); ' +            ' document.write ("Hello"); ' +        ' </script> ' +    ' </p> ',    Domarray = $.parsehtml (markup);    Window.console.log (Domarray);


Looking at the source code of jquery, the parse HTML principle is consistent with the following code:

    /**     * @param {String} markup     * @param {Document} [context]     * @return {Array}     */var parsehtmlwithdiv = Fun Ction (markup, context) {        context = Context | | document;        var wrapper = context.createelement (' div '),            domarray = [],            index,            len;        wrapper.innerhtml = markup;        len = wrapper.childNodes.length;        for (index = 0; index < len; index++) {            domarray.push (Wrapper.childnodes[index]);        }        return domarray;    };


You can also parse by placing HTML content in a hidden iframe.

/** * @param {String} markup * @param {Document} [context] * @return {Array} */var Parsehtmlwithiframe = function (markup, Context) {    context = Context | | document;    var iframe = context.createelement (' iframe '),        body,        index,        len,        domarray = [];    IFRAME.SRC = ";    Iframe.style.display = ' None ';    Context.body.appendChild (IFRAME);    BODY = Iframe.contentDocument.body;    body.innerhtml = markup;    len = body.childNodes.length;    for (index = 0; index < len; index++) {        domarray.push (Body.childnodes[index]);    }    Context.body.removeChild (IFRAME);    return Domarray;}


In the parse HTML process, if you look closely, you can find the following points:

    1. Script in HTML code is not executed
      • Ensure safety
    2. The browser will automatically issue the image src request, can preload the picture
      • When you use a div, the picture is preloaded
      • When using an IFRAME, the picture load request is issued but is canceled
Domparser

If the HTML code after the parse does not need to be injected into the page, the parse HTML process automatically emits the image SRC request will take up resources such as network requests, which is not perfect.

What do you do to not let the browser automatically issue a picture src request?

There is a $.parseHTML() similar approach in jquery, called $.parseXML() , for parse XML. View Source:

Cross-browser XML parsingjquery.parsexml = function (data) {    var xml, TMP;    if (!data | | typeof data!== "string") {        return null;    }    Support:ie9    try {        tmp = new Domparser ();        XML = tmp.parsefromstring (data, "Text/xml");    } catch (e) {        xml = undefined;    }    if (!xml | | xml.getelementsbytagname ("parsererror"). Length) {        jquery.error ("Invalid xml:" + data);    }    return XML;};


JQuery uses Domparser to parse the XML document. Domparser not only parse XML documents, but also parse HTML documents.

Enum Supportedtype {  "text/html",  "Text/xml", "  application/xml", "  application/xhtml+xml",  "Image/svg+xml"}; [Constructor]interface domparser {  Document parsefromstring (domstring str, supportedtype type);};


When parsing an HTML document using Domparser, the browser does not automatically emit the image src request. In browsers that do not support domparser, there is an alternative: domimplementation.createhtmldocument

/** * There is the ways to parse HTML snippet: * 1. Parse HTML in a virtual Document/domparser object. * 2. Create a ' div ' element as wrapper and set HTML as its InnerHTML. * * The 1st can prevent loading images, the HTML and is safer.  * * Note:this function does not IE8 and ie8-* * @param {string} markup the HTML string, can be set as the  innserhtml * @param {document} [context] * of <body/> * @return {Document} If returned value is NULL, can follow The 2ed.    */function parsehtml (markup, context) {var doc, parser, win; Context = Context | |    Document if (context.implementation && context.implementation.createHTMLDocument) {doc = Context.impleme        Ntation.createhtmldocument ();        Doc.body.innerHTML = markup;    return doc; } win = Context.defaultview | |    Window if (win. Domparser) {parser = new win.        Domparser (); try {doc = parser.parsefromstring (', ' text/htML ');            } catch (ex) {//Do nothing} if (doc) {Doc.body.innerHTML = markup;        return doc; }    }};


Reference
    1. https://code.google.com/p/google-caja/issues/detail?id=1823
    2. http://api.jquery.com/jquery.parsehtml/
    3. Https://developer.mozilla.org/en-US/docs/Web/API/DOMParser
    4. http://domparsing.spec.whatwg.org/
    5. Https://developer.mozilla.org/en-US/docs/Web/API/DOMImplementation.createHTMLDocument

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.