Parsing HTML and rendering using canvas

Source: Internet
Author: User
Tags tag name tagname

When learning HTML5, the use of canvas to achieve the parsing and rendering of HTML text, supporting the tag has <p>, <i>, <b>, <u>, <ul>, <li> and refer to Chrome to parse the irregular HTML. The code was thrown on my GitHub (https://github.com/myhonor2013/gadgets, inside the Html-render-with-canvas directory).

The main function of the program is a loop, parsing the HTML text from left to right, wangy to the valid HTML ' < ' location at the beginning of each loop, such as ' <p ', ' <\ ', ' <f ', ' <\x ' and so on, and ' < P ', ' < \ ', ' < \x ' are not valid, in short, ' < ' must be followed by a non-null character, which is a reference to the conclusions of Chrome's resolution. This means that the first similar position must be found at the end of each loop to end the loop. The render function is called at the end of each loop to render on the canvas. In the loop, always be aware that the HTML string pointer is out of bounds, and if it crosses over, the loop is rendered.

First, pretreatment

Preprocessing simply replaces successive null characters (carriage return, tab, indent) in the HTML text with a single space:

var text=data.text.replace (/[\r\n\t]/g,whitespace). Replace (/\s+/g,whitespace). Trim ();

It then looks for the first so-called valid tag position from the beginning of the HTML text and renders the text before the position. Each of the following loops starts with a valid ' < '. This is in two different cases: effective opening of tags and effective closing of tags.

Second, effective open-label processing

Effectively open the label that is ' < ' after the label is not ' \ ', with the regular expression is ^<[^\/]+.*. Look for the ' > ' tag that matches the ' < ' and push the tag name to TagName. Next, according to TagName to determine the following text should be in the format, namely IsBold, Isitalic, Isicon (<li> tags), isunderline, nowrap, Uloffset and other attributes, The Font property values that are required to draw the canvas are then determined based on IsBold and Isitalic. The font and Isicon, Isunderline, nowrap, and Uloffset are the properties that the canvas renders really need. If a supported tag also pushes the label name to TagNames, the font is stacked into Fontsarr, and the subsequent loop determines the text format of its scope based on the two properties.

1while (text[index]!=whitespace&&text[Index]!=rightsyn] {2 Tagname.push (text[index++]);3 if (index==len) break;4                     }5 if (Index==len) return;6 While (Text[index]!=rightsyn) {7 if (index==len) {8 Break ;9                         }Ten                     } One var tag=tagname.join ("). toLowerCase (); A tagname=[]; - if (TAG==TAGB) { - isbold=true; the                     } - else if (Tag==tagi) { - isitalic=true; -                     } + else if (Tag==tagli) { - isicon=true; +                     } A else if (tag==tagu) { at isunderline=true; -                     } - if (tag==tagp| | tag==tagli| | Tag==tagul) { - Nowrap=false; -                     } - else{ in nowrap=true; -                     } to if (Tag==tagul) { + Uloffset+=uloffset; -                     } the                      *if (isitalic==true&&isbold==true) { $ Font=italicbold;Panax Notoginseng                     } -else if (isitalic==false&&isbold==true) { the Font=bold; +                     } Aelse if (isitalic==true&&isbold==false) { the Font=italic; +                     } - else{ $ Font=normal; $                     } - if (validtags.contains (tag)) { - Tagnames.push (tag); the Fontsarr.push (font); -}

The next section is the scope text for this loop, which is placed in the Texttodraw and canvas rendered before the end. The Texttodraw is emptied before the end, and the Isicon is set to false.

Third, effective closed-label processing

Valid closed tag is ' < ' followed by ' \ ' label, with regular expression is ^<\/.*. Also forward to find its matching closure ' < '. If the last one in the closed label signature and TagNames (which holds the label name of a valid open label process in turn, remember), the last element of the tagnames is stacked. If the label name is UL then indent forward to Uloffset, and if the current label name is no longer included in TagNames, the font is processed according to the label semantics, which takes into account the case of multiple layers of nesting.

1 if (text[index]== "/") {2 var arr=[];3while (++index<Len&&text[index]!=rightsyn&&text[index]!=leftsyn) {4                         Arr.push (Text[index]);5                     }6 if (Index==len) return;7                     if (Text[index]==leftsyn) break;8                     var tag=arr.join ("). Trim (). toLowerCase ();9                     if (tag==tagnames[tagnames.length-1]) {Ten                         Font=fontsarr.pop (); One                         Tagnames.pop (); A if (tag==tagul) { -                             Uloffset-=uloffset; -                             Uloffset= (Uloffset>0)? uloffset:0; the                         } - if (!tagnames.contains (tag)) { - if (Tag==tagi) { - font=font.replace ("italic", ' normal '); + Isitalic=false; -                             } + else if (TAG==TAGB) { A Font=font.replace ("bold", ' normal '); at Isbold=false; -                             } - else if (tag==tagu) { - Isunderline=false; -                             } -                         } in                     } -}

The next is also the scope text for this loop, which gets it and renders it based on the property values previously determined. and open label processing consistent, no longer repeat.

Four, canvas rendering

Two global variables Xoffset and yoffset are used to identify where the last render ended. At the beginning of the rendering, the two properties need to be adjusted according to Uloffset, NoWrap, and other properties. Then, if you have the Isicon property, draw the front solid circle that corresponds to the <li> label. The text is then rendered, the font is set after the character is taken out and used measuretext to measure whether full line, if it is drawn after the line will need to wrap. Draw the underline as you want it to be drawn during the rendering process. So again and again until all the characters are drawn. The complete rendering function is as follows:

1 var drawtext=function (data) {2 Data=data.trim ();3 var len=data.length;4 if (len==0) {5 return;6                 }7if (!nowrap&&xoffset>margin) {8 xoffset = Margin+uloffset;9 Yoffset + = lineheight; Ten                 } One                  A if (isicon) { - Ctx.beginpath (); - Ctx.arc (margin+uloffset+margin,yoffset-margin,margin,0,math.pi*2,true); the Ctx.closepath (); - Ctx.fill (); - Xoffset +=30; -                 } +                  -                  + var index=0; A var renderindex=0; at Ctx.font=font; -while (index<Len){ - While (Canvaswidth-xoffset>Ctx.measuretext (data.substring (Renderindex,++index)). width) { - if (index===len) { - Break ; -                         } in                     } -                      to if (index==len) { + Ctx.filltext (data.substring (Renderindex,index), xoffset,yoffset); - if (isunderline) { the Canvas.strokestyle = "Red"; * canvas.linewidth = 5; $ Ctx.beginpath ();Panax Notoginseng Ctx.moveto (Xoffset, yoffset); - Ctx.lineto (Xoffset+ctx.measuretext (data.substring (Renderindex,index)). width, yoffset);  the Ctx.closepath (); + Ctx.stroke (); A                         } the Xoffset+=ctx.measuretext (data.substring (Renderindex,index)). width; + Break ; -                     } $ Ctx.filltext (data.substring (renderindex,--index), xoffset,yoffset); $ if (isunderline) { - Canvas.strokestyle = "Red"; - canvas.linewidth = 5; the Ctx.beginpath (); - Ctx.moveto (Xoffset, yoffset);Wuyi Ctx.lineto (Canvaswidth, yoffset); the Ctx.closepath (); - Ctx.stroke (); Wu                     } -                      About                      $ Renderindex=index; - xoffset = MARGIN; - Yoffset + = lineheight; -                 } A return; +};

Conclusion

Using JS to parse HTML should not use recursion, so processing can easily cause stack overflow and performance problems. The Contains method for an array that appears in another code is a method that is added on the prototype of the array to determine whether to include a string:

Array.prototype.contains=function (item) {            return new RegExp ("^" + this.join ("|") + "$", "I"). Test (Item.tostring ());        }

Parsing HTML and rendering using canvas

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.