How does a browser work: rendering engine, HTML Parsing

Source: Internet
Author: User
Rendering Engine

The rendering engine is responsible ...... Rendering, that is, display the request content on the browser screen.

By default, the rendering engine can display HTML, XML documents, and images. You can use the plug-in (browser extension) to display other types of documents. For example, use the PDF viewer plug-in to display PDF files. We will discuss plug-ins and extensions in a special chapter. In this section, we will focus on the main purpose of the rendering engine-displaying HTML and images formatted with CSS.

Various rendering engines

The Firefox and Safari browsers we mentioned are built on two rendering engines: Firefox uses gecko-Mozilla's own rendering engine; safari and chrome both use WebKit.

WebKit is an open-source rendering engine derived from an engine on the Linux platform. It can be modified by Apple to support Mac and Windows platforms. For more information, see: /.

Main Process

The rendering engine starts to obtain request content from the network layer. Generally, it is a data block of no more than 8 KB. The following is the basic workflow of the rendering engine:

Figure 2: basic workflow of the rendering engine (parse HTML to build a DOM tree, render tree construction, render tree layout, and draw a rendering tree ).

The rendering engine parses HTML documents and converts tags into DOM nodes in the content tree. It parses the style element and the style data in the external file. Style Data and Display Control in HTML will be used together to create another tree-rendering tree.

The rendering tree contains rectangles with color, size, and other display attributes. The order of these rectangles is the same as that of the display sequence.

After the rendering tree is built, the layout is processed, that is, the exact display position of each node on the screen is determined. The next step is to draw-traverse the rendering tree and draw each node at the UI backend layer.

It must be understood that this is a slow process. For a better user experience, the rendering engine will try to display the content as soon as possible. It does not create and layout the rendering tree until all HTML files are parsed. It displays the processed partial content while processing the subsequent content.

Main Process example


Figure 3: Main WebKit flowchart 4: Main gecko rendering engine process of Mozilla (3.6)

As shown in figure 3 and figure 4, although WebKit and gecko use slightly different terms, this process is basically the same.
In Gecko, formatted visual elements are called "frame trees ). Each element is a frame ). WebKit uses the term "rendering Tree", which is composed of "rendering objects. In WebKit, "layout" is used to indicate the layout of elements, and gecko is called "reflow ". WebKit uses "attachment" to connect DOM nodes and visualization information to build a rendering tree. A small non-semantic difference is that gecko has an additional layer between HTML and Dom trees, called "content sink", which is the factory for creating DOM objects. We will discuss each part of the process.



Because Parsing is a very important processing in the rendering engine, we will go a little deeper. Let's start with a small introduction.

Parsing a document means translating it into a meaningful structure for Code use. The resolution result is usually a tree composed of nodes that characterize the document. It is called a resolution tree or a syntax tree.

Example -- the parsing expression "2 + 3-1" can return the following tree:

Figure 5: Mathematical Expression Tree node syntax

Parsing is based on the syntax rules followed by the document-the language or format used for writing. Each resolvable format must consist of a definite syntax and vocabulary. This is called context-independent syntax. Human language is not such a language, so it cannot be parsed using conventional parsing technology.

Parser-lexical analyzer combination

The parser has two processing processes: lexical analysis and syntactic analysis.

Lexical analysis divides the input into a sequence of symbols, which are the words of a language. They are composed of all valid words in the language.

Syntactic Analysis is an application of the syntax rules of the language.

The parser usually divides the work into two components-the word segmentation program is responsible for dividing the input into valid symbol sequences, and the parsing program is responsible for analyzing the document structure and constructing the syntax tree according to the syntax rules. The Lexical analyzer knows how to filter irrelevant characters such as spaces and line breaks.

Figure 6: from the source document to the resolution tree (document, lexical analysis, syntactic analysis, and resolution tree ).

The parsing process is interactive. The parser usually obtains new symbols from the lexical analyzer and tries to match the syntactic rules. If the match is successful, create the corresponding node on the syntax tree and continue to obtain the next symbol from the lexical analyzer. If no matching rule exists, the parser saves the symbol internally and continues to obtain the symbol from the lexical analyzer until all the characters saved internally match a rule. If the final match fails, the parser throws an exception. This means that the document is invalid and contains syntactic errors.


In most cases, the parsing tree is not the final result. Resolution is often used to convert an input document to another format. For example, the compiler will first parse the source code into a machine code into a parsing tree, and then convert the parsing tree into a machine code.

Figure 7: compilation process (source code, parsing, parsing tree, conversion, and machine code ). Resolution example

In Figure 5, we construct a mathematical expression parsing tree. Let's try to define a simple mathematical language and see how parsing works.

Vocabulary: our language can contain integers, plus signs, and minus signs.


  1. A syntactic block consists of expressions, terms, and operators.
  2. Our language can contain any number of expressions.
  3. The expression is defined as a term followed by an operator, followed by another term.
  4. The operator is the plus or minus sign.
  5. The term can be an integer or expression.

Let's enter "2 + 3-1" for analysis ″.

The first sub-string that complies with the rule is "2", which is a term according to Rule #5. The second match is "2 + 3" and complies with the second rule-a term follows one operator and another term. The next match appears at the end of the Input ." 2 + 3-1 "is an expression, because we know that" 2 + 3 "is a term, so it complies with the second rule. "2 + +" does not match any rules, so it is invalid input.

Definition of legitimacy of lexical and syntaxes

Word exchange is commonly expressed by regular expressions.

For example, our language can be defined:

INTEGER :0|[1-9][0-9]*PLUS : +MINUS: -

As you can see, an integer is defined by a regular expression.

The syntax is commonly defined in BNF format. Our language is defined:

expression :=  term  operation  termoperation :=  PLUS | MINUSterm := INTEGER | expression

We have said that the regular parser can only parse languages with context-independent syntax. An intuitive definition of this language is that its syntax can be fully expressed using BNF. See for its specification Definition

Parser type

The parser has two basic types: top-down parser and bottom-up parser. Subjectively, the top-down parser tries to match the syntaxes starting from the upper-level syntaxes. The bottom-up parser starts from the input and gradually converts it into syntactic rules, starting from the underlying rules, until all upper-layer rules match.

Let's take a look at how the two parsers will parse our example:

The top-down parser starts from the upper-layer rule and defines "2 + 3" as an expression, then define "2 + 3-1" as the expression (other rules are also matched when the expression is defined, but the starting point is the highest level rule ).

The bottom-up parser scans the input until a matching rule exists. It replaces the input with the rule. The input ends. Some matching rules are placed in the Parsing Stack.

Stack Input
  2 + 3-1
Term + 3-1
Term Operation 3-1
Expression -1
Expression operation 1

This bottom-up parser is called a shift reduction parser because the input is moved to the right (imagine a pointer moving gradually from the point to the input) and gradually reduced to the syntax tree.

Automatic parser Creation

Some tools can be used to create a parser for you, which is usually called a parser generator. You only need to provide the syntax-Vocabulary and syntax rules-to generate a working parser. Creating a parser requires a deep understanding of the parser, and it is not easy to manually create an optimized parser. Therefore, the parser generation tool is very useful.

WebKit uses two well-known parser generation tools: Flex is used to create a lexical analyzer, And Bison is used to create a parser (You may see that they exist in names of lex and YACC ). The flex input file is the regular expression definition of the symbol, and the bison input file is the syntax definition in BNF format.

HTML Parser

The HTML Parser parses HTML tags into the parsing tree.

HTML syntax definition

HTML terms and syntaxes are defined in the W3C organization-created specification. The current version is html4, and HTML5 is in progress.

Not context-independent syntax

In the introduction to the parser, we can see that the syntax can be defined in a format similar to BNF. Unfortunately, all general parser discussions are not applicable to HTML (I mentioned them for entertainment, they can be used to parse CSS and JavaScript ). HTML cannot be defined using the context-independent syntax required by the parser. In the past, the HTML format specification was defined by document type definition, but it is not a context-independent syntax.

HTML is quite similar to XML. XML has many available Resolvers. Another XML variant in HTML is XHTML. What are the main differences between them? The difference is that HTML applications are more "tolerant" and allow you to miss some start or end tags. It is a "soft" syntax, not as rigid as XML. In general, this seemingly subtle difference creates two different worlds. On the one hand, HTML is very popular, because it embraces your mistakes and makes the life of webpage authors easy. On the other hand, it makes it difficult to write the syntax format. Therefore, HTML Parsing is not simple, and the context parser is not feasible.


The HTML definition uses a DTD file. This format is used to define the SGML language. It contains definitions of all allowed elements, including their attributes and hierarchical relationships. As we mentioned earlier, the html dtd does not constitute a context-independent syntax.

DTD has several different types. Strict mode is fully compliant with specifications, but other modes may include support for labels used by earlier browsers for forward compatibility. Current strict mode DTD:


The parser output tree consists of DOM elements and attribute nodes. Dom is called the Document Object Model. It is an object description of HTML documents and an interface between HTML elements and external elements (such as JavaScript.

There is almost a one-to-one relationship between Dom and tags. The following labels

Will be converted into a DOM tree, for example:

Figure 8: DOM tree of the example Markup


Like HTML, Dom specifications are also developed by W3C. Reference: This is a general specification for operational documentation. There is a dedicated module defining HTML-specific elements:

When we say that a tree contains a DOM node, this tree is composed of elements that implement the DOM interface. These implementations contain the attributes required by other browsers.

Resolution Algorithm

As we can see earlier, HTML cannot be parsed using top-down or bottom-up parser.

The reasons are as follows:

  1. Language tolerance
  2. The browser must provide error tolerance for invalid HTML.
  3. The parsing process is repeated. The source code remains unchanged during parsing. However, in HTML, content can be added when the script tag contains "document. Write", that is, the parsing process will actually change the source code.

The browser creates its own parser to parse HTML documents.

The parsing algorithm is described in the HTML5 specification. The parsing consists of two parts: Word Segmentation and building tree.

Word Segmentation is part of lexical analysis. It parses the input into a symbolic sequence. In HTML, symbols are start tags, end tags, attribute names, and birth values.

The word divider identifies these symbols and sends them to the tree builder. Then, the analytics continues to process the next symbol until the input ends.

Figure 6: HTML parsing process (derived from the HTML5 Specification)


Word Segmentation Algorithm

The output of the algorithm is an HTML symbol. Algorithms can be described using state machines. Each status consumes one or more characters from the input stream and updates the next status based on them. The decision is affected by the current symbol status and the build status of the tree. This means that the same character may produce different results, depending on the current status. The algorithm is too complex. Let's use an example to look at its principles.

Basic example: analyze the following labels:

The initial status is "data state". When "<" is encountered, the status changes"Tag open state". After a symbol consisting of "A-z" is eaten, the "start tag token" is generated, and the status changes"Tag name state". We keep this status until we encounter "> ". Each character is appended to a new symbol name. In our example, the final symbol is "html ".

When ">" is encountered, the current symbol is complete and the status changes back."Data state"." <Body> "the tag will be processed in the same way. Now the "html" and "body" labels are complete."Data state"Status. When "H" ("Hello World" first letter) is eaten, a character symbol is generated until the "</body>" symbol is met, we have completed a character "Hello World ".

Now let's go back"Tag open state"Status. When the next input "/" is eaten, an "end tag token" is generated and changed"Tag name state"Status. Similarly, this status remains until we encounter ">. When the new tag symbol is complete, we return"Data state". Similarly, "
Figure 9: word segmentation of the sample Input Source


Tree Construction Algorithm

When the parser is created, the Document Object is also created. During tree construction, the root node of the DOM tree will be modified, and the elements will be added to it. Nodes completed by each word divider are processed by the tree builder. The Specification defines the DOM object associated with each symbol. In addition to adding an element to the DOM tree, it is also added to an open element stack. This stack is used to correct nested errors and labels that are not closed. This algorithm is also described by the state machine. Its state is called "insertion Modes ".

Let's take a look at the following tree construction process:

During tree construction, the input is the symbol sequence obtained during word segmentation. The first mode is called"Initial mode". After receiving the HTML symbol, it will become"Before HTML"Mode and re-process the symbols in this mode. This creates an htmlhtmlelement and appends it to the root document node.

Then the status changes"Before head". When we receive the "body", an htmlheadelement is created implicitly. Even if we do not have this label, it is also created and added to the tree.

Now go"In head"Mode, and then"After head", The body will be reprocessed, The htmlbodyelement element will be created and inserted, and then enter"In body"Mode.

After receiving the character "Hello World", a "text" node is created, and all characters are appended to the node one by one.

Enter"After body"Mode. After receiving the HTML end tag, enter"After after body"Mode. After all symbols are processed, resolution is terminated.

Figure 10: action after parsing the example HTML tree

At this stage, the browser will mark the document as the interactive mode and start parsing the deferred mode script ." Deferred "means the script should be executed after the document Parsing is complete. After the script is processed, it enters the "complete" status and the "LOAD" event occurs.

The HTML5 specification contains the complete algorithm:

Browser Fault Tolerance

You will never see an HTML page syntax error. The browser fixes the error and continues. Take a look at the following example:

I must have violated millions of rules ("My tag" is an illegal tag, "P" and "Div" element nesting error, etc.), but the browser still displays the information correctly, no complaints. So a lot of parser code is correcting these HTML author errors.

Browser error handling is quite unified. What's amazing is that this is not part of the current HTML specification, just like bookmarks, advances, and retreats, which have been developed in the browser for many years. Some invalid HTML structures appear on many websites, and browsers attempt to fix these errors in the same way as other browsers.

Some things should be defined in the HTML5 specification. WebKit makes a good summary in the comments at the beginning of its HTML Parser class:

The parser analyzes input symbols to generate documents and build a document tree. If the document format is good, the parsing will be simple.
Unfortunately, the parser needs to tolerate many incorrectly formatted HTML documents.
We should take care of at least the following errors:
1. The element must be inserted in the correct position. Unclosed tags should be closed one by one until new elements can be added.
2. directly adding elements is not allowed. Users may miss some labels, such as: HTML head body tbody tr TD Li (What do I miss ?).
3. When adding a block element to an inline element, close all inline elements and add a block element.
4. If the above does not work, close all elements until they can be added or ignored.

Let's take a look at some examples of WebKit Fault Tolerance:

Use </BR> instead <br>

Some sites use </BR> instead of <br>. For better compatibility with IE and Firefox, WebKit regards it as <br>. The Code is as follows:

if (t->isCloseTag(brTag) && m_document->inCompatMode()) {     reportError(MalformedBRError);     t->beginTag = true;}

Note that the error handling here is internal and will not be displayed to the user.

Lost table

As in the following example, a table is contained in the content of another table, but not in the cells of an External table:

<table><table><tr><td>inner table</td></tr>         </table><tr><td>outer table</td></tr></table>

WebKit changes the hierarchical relationships and processes them into two adjacent tables:

<table><tr><td>outer table</td></tr></table><table><tr><td>inner table</td></tr> </table>


if (m_inStrayTableContent && localName == tableTag)        popBlock(tableTag);

WebKit uses a stack to save the current element. It will pop up the tables in the stack to the External table stack to make them brother tables.

Element nesting

To prevent nesting of a form, the second form is ignored. Code:

if (!m_currentFormElement) {        m_currentFormElement = new HTMLFormElement(formTag,    m_document);}
Deep Element Level

Note: mx is a typical hierarchical model, which uses a large number of <B> nested to the depth of 1500 labels. We only allow the same label to appear 20 times in a row. If it exceeds the limit, all the labels will be ignored.
bool HTMLParser::allowNestedRedundantTag(const AtomicString& tagName){unsigned i = 0;for (HTMLStackElem* curr = m_blockStack;         i < cMaxRedundantTagDepth && curr && curr->tagName == tagName;     curr = curr->next, i++) { }return i != cMaxRedundantTagDepth;}
Incorrect HTML or body end tag location

Note is still clear:

We will never close the tag if real error HTML is supported, because some silly webpages close the tag before the document ends. Let's use end () to close the label.
if (t->tagName == htmlTag || t->tagName == bodyTag )        return;

Therefore, the webpage authors are careful to write HTML in the correct format unless you want to write a WebKit Fault-Tolerant sample code.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.