The major web browsers load Web pages in basically the same. This process is known as parsing and are described by the HTML5 specification. A High-level understanding of this process are critical to writing Web pages, that load efficiently.
Parsing overview
As chunks of the HTML source become available from the network (or cache, filesystem, etc), they is streamed to the HTML Parser. Next, in a process known as tokenization, the parser iterates through the source generating a token for (most notably) EAC H start tag, end tag and character outside of a tag.
For example the input source <b>hello</b>
yields 7 tokens:
Start-tag {name:b}character {data:h}character {data:e}character {data:l}character {data:l}character {data: o}end-tag {name:b}
After all token is generated it's serially passed to the next major subsystem:the tree Builder. The tree builder dynamically modifies the Document ' s DOM tree to reflect the new token.
The 7 input tokens above yield the following DOM tree:
Fetching subresourcesA frequent operation performed by the tree builder was creating a new HTML element and inserting it into the Document. It is at the point of insertion that HTML elements which load subresources begin fetching the Subresource.
Running scriptsThis parsing algorithm seems to translate the HTML source into a DOM tree as efficiently as possible. That's, except for one wrinkle:scripts. When the encounters of the tree Builder is End-tag token for a script, it must serially execute the script before parsing can con Tinue (unless the associated script Start-tag has the defer or async attribute).
There is significant preconditions which must be fulfilled before a script can execute:
- If the script is external it source must be fully downloaded.
- For any script, all stylesheets in the document must is fully downloaded.
This means often the parser must idly wait while scripts and stylesheets is downloaded.
Why must parsing halt?Well, a script may document.write
something which affects further parsing or it could query something about the DOM which would yield Incorrect results if parsing had continued (for instance the number of image elements in the DOM).
Why wait for stylesheets?A script may expect to access the CSSOM directly or it could query an attribute of a DOM node which depends on the Styleshee T (for example, how wide is a certain <table>).
Is it inefficient to block parsing?Yes. Subresource download times often has a large constant factor limited by round trip time. This means it was faster to download and the resources in parallel than to download the same, in serial. More obviously, the browser is a also free-to-do CPU work while waiting on the network. For these reasons it's critical to efficient loading of a Web page this Subresource fetches is initiated as soon as poss Ible. When parsing was blocked, the tree builder is not able to insert subsequent elements into the DOM, and thus subsequent SUBR Esource downloads was not initiated even if the HTML source which includes them was already available to the parser.
Mitigating BlockingAs I ' ve blogged previously, when the parser becomes blocked WebKit would run a lightweight parser known as the preload scan Ner. It mitigates the blocking problem by scanning ahead and fetching certain subresource so may be required. Other browsers employ similar techniques.
It is important to note that even with preload scanning, parsing is still blocked. Nodes cannot is added to the DOM tree. Although I Haven ' t covered how a DOM tree becomes a render tree, layout or painting, it should is obvious that before a no De is in the DOM tree it cannot was painted to the screen.
Finishing parsingAfter the entire source have been parsed, first all deferred scripts would be executed (waiting for their source and all pen Ding stylesheets to download). Their completion triggers the DOMContentLoaded
event to be fired. Next, the parser would wait for any pending async scripts to finish loading and executing. Finally, once all subresources has finished downloading, the window's event would be fired and parsing are complete load
.
Takeaway With this understanding, it becomes clear how important it's to carefully consider where and how stylesheets and scrip TS is included in the document. Those decisions can has a significant impact on the efficiency of the page load.