As a browser, it is the most important work to render the Web page quickly. In order to do this, chromium to do a lot of optimization work. These optimizations are effective and represent today's most advanced web rendering techniques. It is worth mentioning that these rendering techniques apply not only to Web page rendering, but also to native system UI rendering. For example, on an Android system, we can see similarities between the two in rendering techniques. In this paper, we will briefly introduce the Web rendering mechanism of chromium and make a study plan.
Lao Luo's Sina Weibo: Http://weibo.com/shengyangluo, welcome attention!
Chromium's web page rendering mechanism can be described in eight words: vertical layering, horizontal chunking. Among them, layering is done by WebKit, which is to abstract a Web page into a series of tree. The tree consists of a layer, which is what we call layers. From the previous Chromium page loading process brief introduction and Learning plan this series of articles can be known that WebKit created DOM tree, render Object tree, render layer Tree, and graphics layer for the Web page in turn Tree four tree. Among them, the rendering is related to the following three trees. There are two benefits to layering a Web page. One is to reduce unnecessary drawing operations, and the second is to use hardware acceleration to render animations.
The first benefit is that webkit divides the rendering of a page frame into two steps for drawing and compositing. Drawing is the process of converting a drawing operation into an image, which is the process of blending all the images together and displaying them on the screen. Note that for a screen, regardless of whether the contents of one of its areas change, it is always necessary to refresh in its next frame display. This means that the system always needs to give it a full screen content. Consider a scenario where a Web page is displayed in full screen, and the page is abstracted into two layers. In the next frame display, only one layer of content has changed. At this point, only the layers that have changed the content are drawn, and then the entire screen can be synthesized with a frame of the image that has been drawn on the other layer. This avoids unnecessary drawing operations, and the additional cost is a synthetic operation. Note, however, that compositing is a lightweight operation relative to drawing, especially for hardware-accelerated rendering, which is just a texture-pasting process, and the texture content itself is ready. The second benefit is that some animations are placed on a single layer, and then some kind of transformation is applied to the layer to form an animation. Some transformations, such as panning, rotating, zooming, or alpha gradients, are easy to implement for hardware acceleration.
Chunking is a relatively microscopic concept compared to layering, and it is for each layer. In general, the content of a Web page is much larger than the screen, so the user will regularly scroll or zoom to browse. This is especially true on mobile devices. If all the content is visible and then drawn, it makes the user feel like a lag. On the other hand, if you start with all the content of the page, whether visible or not visible, are drawn, it will make the user feel that the page loading is slow, and it is very resource-intensive. Neither of these options is appropriate. The best way to do this is to display the current content as quickly as possible, and to pre-plot what's next most likely to be visible when there is a surplus of labor. This means that different areas of a layer are assigned different drawing priorities. Each area is a block (tile), and each layer consists of several blocks. Where the blocks in the currently visible area have the highest precedence, they need to be drawn first.
Sometimes it takes a lot of time to draw only those blocks that have the highest priority. One of the key factors in this is texture uploading. Here we only discuss hardware-accelerated rendering scenarios. Texture upload operations are a slow process compared to normal GPU commands. In order to solve this problem, chromium first draw the content of the webpage according to a certain proportion, for example 0.5 proportion. This allows you to reduce the texture size by three-fourths. While displaying the content of the 0.5 scale, continue to draw normal proportions of the page content. When the normal proportion of page content is drawn, replace the currently displayed low-resolution content. This approach, while allowing the user to see low-resolution content at the beginning, is also better than what the user can see at the beginning.
The above is the concept of Web page layering and chunking. Conceptually it is not difficult to understand, but in implementation, they are quite complex and will be doped with other optimization points. For example, each page will have two threads working together to complete the rendering process. This allows you to take advantage of the multicore features of modern CPUs. However, this can also complicate the rendering process of a Web page because it involves thread synchronization issues. This rendering is also referred to as threading rendering. We looked back at how the Android app UI was rendered, and it evolved from a single thread to multi-threaded rendering. Before 5.0, only one thread of an Android application process was responsible for UI rendering, which was the main thread, also known as the UI thread. By 5.0, a thread, called the render thread, has been added to complete the rendering of the Android application UI along with the UI thread. This can be referred to in the previous Android application UI hardware accelerated rendering technology brief introduction and study plan of this series of articles.
To better support threading rendering, the CC (Chromium compositor) module in Chromium, which is responsible for rendering the Web page, creates three trees that correspond to the graphics Layer tree created by WebKit, as shown in 1:
Figure 1 Layer tree, Pending layer tree, and active layer tree in the CC module
Where the layer tree is maintained by the render Thread (also known as the main thread) in the render process, the Pending layer tree and the active layer tree are represented by the compositor in the render process Thread (also known as Impl thread) maintenance. When needed, the layer tree synchronizes with the pending layer tree, which is a UI-related synchronization between the render thread and the compositor thread. This operation is performed by the compositor thread. The Render thread is in a wait state during execution. When execution is complete, the compositor thread blocks the layer in the pending layer tree, and the block is rasterized, that is, the drawing command becomes an image. When the pending layer tree finishes rasterization, it becomes the active layer tree. The active Layer tree is the tree that chromium is currently displaying to the user.
We can see in Figure 2 more visually the collaborative process of the render thread and compositor thread in the render process, as follows:
Figure 2 The collaboration process for the Render thread and compositor thread
When the Render thread finishes drawing the nth frame of the Layer tree, it synchronizes to the pending layer tree of the compositor thread. When the synchronization is complete, the Render thread draws the nth + 1 frame of the layer tree. At the same time, compositor thread is also in a hurry to raster the nth frame and other operations. As you can see from here, the drawing operation of frame n+1 is synchronized with the rasterization of nth frames, so they can make full use of the multi-core features of the CPU, thus improving the rendering efficiency of the Web page.
After the compositor thread has finished rasterization, it gets a series of textures that will eventually be synthesized in the GPU process. After compositing, users can see the contents of the Web page. The render process is a series of articles that use command buffer to request GPU processes to execute GPU commands, as well as to pass information such as texture resources, as outlined in the previous chromium hardware accelerated rendering mechanism basics and learning plan.
As we can see from Figure 3, a single rendering of the Web page actually involves three core threads, in addition to the render thread and compositor thread in the render process, as well as the GPU thread in the GPU process. And the three threads can be executed in parallel, and further utilize the CPU's multicore features.
To better manage the layer Tree,render thread, a Layertreehost object is created. Similarly, in order to better manage the pending layer Tree and the active layer Tree,compositor thread, a Layertreehostimpl object is created. The communication between the Layertreehost object and the Layertreehostimpl object represents the collaboration between the render thread and the compositor thread.
The Layertreehost object and the Layertreehostimpl object are not directly communicated, but are carried out through a proxy object, which is shown in 3:
Figure 3 Layertreehost object and Layertreehostimpl object communicating via proxy object
The Layertreehost object and the Layertreehostimpl object are communicated through the proxy object in order to support both threading and single-threaded rendering mechanisms. For a single-threaded rendering mechanism, the proxy object used is actually a single Thread proxy object, as shown in 4:
Figure 4 Single-Thread rendering
In the single-threaded rendering mechanism, both the Layertreehost object and the Layertreehostimpl object are actually running in the render thread, so the single Thread proxy implementation is simple, it is through the Layer_tree_ The Host_ and Layer_tree_host_impl_ two member variables refer to Layertreehost objects and Layertreehostimpl objects, respectively. When the Layertreehost object and the Layertreehostimpl object need to communicate with each other, they are directly called through the two member variables.
For the threading rendering mechanism, the proxy object used is actually a threaded proxy object, as shown in 5:
Figure 5 Threading rendering
Threaded Proxy has a member variable impl_task_runner_, which points to a Singlethreadtaskrunner object. This Singlethreadtaskrunner object describes the message loop of the compositor thread. When the Layertreehost object needs to communicate with the Layertreehostimpl object, it sends a message to the compositor thread through the Singlethreadtaskrunner object in the threaded proxy above. To request that an Layertreehostimpl object perform an operation in the compositor thread. However, threaded proxy does not immediately send the request to the Layertreehostimpl object execution, but will decide whether to make a request to it based on the current state of the Layertreehostimpl object. One benefit of this is that the CC module can handle the rendering work smoothly. For example, if the current Layertreehostimpl object is rasterization the contents of the previous frame, when the Layertreehost object requests the next frame's content, then the request to draw the next frame will be postponed until the previous frame content rasterization is complete. This avoids state confusion.
Threaded Proxy is a scheduler (Scheduler) that arranges Layertreehostimpl objects when to do what. The scheduler also provides scheduling for the scheduler by recording the state flow of Layertreehostimpl objects through a state machine. Typically, during the browsing of a Web page, the scheduler is dispatched to perform the following operations:
1. If the drawing surface has not yet been created, the scheduler emits a action_begin_output_surface_create operation, when the Layertreehostimpl object creates a drawing surface for the Web page. For a page's drawing surface and its creation process, you can refer to the previous chromium hardware accelerated rendering of the OpenGL context drawing surface creation process Analysis article.
2. When the layer tree changes and needs to be redrawn, the scheduler emits a action_send_begin_main_frame action, when the Layertreehost object redraws the Web page. Here are two points to note. The 1th is the drawing that is described here, in effect simply recording the drawing command in a command buffer. The 2nd is that the drawing operation is performed in the render thread.
3. Layertreehost object After the layer tree is redrawn, the Render thread is in a wait state. The scheduler then issues a action_commit action that notifies the Layertreehostimpl object to synchronize the contents of the layer tree to the pending layer tree. This synchronization operation is performed in the compositor thread. When the synchronization is complete, the Render thread is awakened, and the compositor thread continues to update the tiles in the pending Layer tree, such as the priority of the update block.
4. After updating the tiles in the pending layer tree, the scheduler emits a action_manage_tiles operation that notifies the Layertreehostimpl object to rasterize the tiles in the pending layer tree.
5. After the Pending layer tree finishes rasterization, the scheduler continues to emit a action_activate_pending_tree operation, when the Pending layer tree becomes the active layer tree.
6. After the Pending layer tree becomes an active layer tree, the scheduler emits a action_draw_and_swap_forced, At this point the Layertreehostimpl object collects the chunked information that has been rasterized and sends it to the browser process so that the browser process can display the blocks in a browser window. This can be referred to in the previous chromium hardware accelerated rendering UI compositing process analysis article.
The 2nd to 6th operation is the full rendering process of the page frame, which is repeated continuously during the browsing of the Web page.
In the Web page rendering process, the most important thing is the management of the chunking. Chunking is managed as a layer. There are two important terms involved: tile and tiling. Their relationship is shown in 6:
Figure 6 Tiles and tiling
Tiling is a region of tiles that have the same scale factor. In the chromium source code, tiling is described by the class picturelayertiling. A layer may be chunked according to different scaling factors, as shown in 7:
Figure 7 Tiling of different scaling factors
In Figure 7, the size of the block is set to 256x256px. For tiling with a scale factor of 1.0, 1 pixels in a tile correspond to 1 pixels of the layer space. For tiling with a scale factor of 0.67, 1 pixels in a tile correspond to 1.5 pixels of the layer space. Similarly, for tiling with a scaling factor of 0.5, 1 pixels in the tile correspond to 2 pixels of the layer space.
Why does a layer have to be divided by different scaling factors? As mentioned above, mainly in order to scroll or zoom the page, you can quickly display the content of the Web page, although the content shown is low-resolution, 8 shows:
Figure 8 Page magnification process
Figure 8 shows the process of a Web page being zoomed in. At the beginning, the tiles of the lower scaling factor are magnified to fill the visible area. At this time, the same blocks as the actual amplification factor are being quietly created and rasterized behind them. When these operations are completed, they are displayed in the visible area for the tiles that replace the lower scaling factor. So, when we zoom in on the page, we first see the blurred content, and then quickly see the clear content.
The area occupied by a layer of content may be very large, exceeding the size of the screen. At this point we do not want to block the entire layer of content, because it wastes resources. At the same time, we want to block as much content as possible so that when the currently invisible tile becomes visible, it will be displayed quickly. The CC module divides the content of a layer roughly into three regions, as shown in 9:
Figure 9 Layer Area Division
As you can see from Figure 9, there are three regions of viewport, Skewport, and eventually rect, and the CC modules are only chunked. Where viewport describes the area that is currently visible, Skewport describes the area that is next visible along the user's sliding direction, eventually rect is an area that is formed by adding a thin border over the four-week period of viewport. The content of this area we think will eventually have a chance to be shown. Obviously, from an important point of view, Viewport > Skewport > Eventually Rect. Therefore, the CC module prioritizes the chunking in the viewport, followed by the chunking in the Skewport, and finally the block in the eventually rect.
After determining which areas require chunking and the rasterization order of the tiles, the next most important operation is to perform the rasterization operation, as shown in 10:
Figure 10 sub-block rasterization process
Web pages are drawn according to the graphics layer, and they are drawn in a picture. This picture actually just saves the drawing command. Turning these drawing commands into pixels is what rasterization does. Rasterization can be done either through the GPU or through the CPU.
If it is done through the GPU, then the drawing commands saved in picture will be converted to OpenGL command execution, and the resulting pixels will be stored directly in a GPU texture. So this rasterization is also called direct Raster.
If Rasterization is done through the CPU, then the draw command saved in picture will be converted to the Skia command execution. Skia is a 2D graphics library that performs drawing operations through the CPU. The resulting pixels are saved in a memory buffer. On the Android platform, the CC module provides three ways to rasterize the CPU.
The first method is called Image Raster, and the rasterized pixels are saved in a native buffer that is unique to the Android platform. This native buffer can be accessed by the CPU or by the GPU as a texture. This allows you to avoid performing a texture upload operation once the rasterization operation is complete, which can solve the problem of texture upload speed very well. So this rasterization is also known as Zero Copy Texture Upload.
The second way is called Image Copy Raster, and the rasterized pixels are also saved in Android Native buffer. However, the pixels stored in Android Native buffer will be copied to another standard GPU texture. Why did you do it? Because Android Native Buffer has limited resources, releasing it after rasterization is done can reduce resource requirements. So this rasterization is also known as one Copy Texture Upload.
The third method is called pixel buffer Raster, and the rasterized pixels are stored in pixel buffer in OpenGL, and the pixel buffer is then uploaded to the GPU as texture data. Compared with the previous two kinds of CPU rasterization, it is the most inefficient, because it involves the infamous texture upload problem.
Why do rasterized pixels eventually have to be uploaded to the GPU? Because we're only talking about hardware-accelerated rendering, so whether it's GPU rasterization or CPU rasterization, the rasterized pixels are kept in the GPU before they can be rendered on the screen with hardware acceleration.
These are the important concepts and key processes that chromium is involved in when rendering Web pages. In the implementation process, these concepts and processes are more complex. Next, we will analyze the chromium web rendering mechanism in more detail according to the following seven scenarios:
1. Layer Tree creation process;
2. The execution process of the scheduler;
3. Output surface creation process;
4. Web page drawing process;
5. Layer tree and pending layer tree synchronization process;
6. Tile rasterization process;
7. The Pending layer tree is activated as an active layer tree process;
After understanding the above seven scenarios, we will have a deep understanding of the chromium web rendering mechanism, please pay attention! More information can also focus on Lao Luo's Sina Weibo: Http://weibo.com/shengyangluo.
Chromium Web page rendering mechanism brief introduction and learning Plan