At the Unreal Open Day 2017 event, Epic Games developer support engineer Mr. Guo Chunbiao introduced the UI optimization techniques in Unreal Engine 4 to the developers present. The following is a speech record.
Hello everyone, I'm Guo Chunbiao, a developer support engineer at Epic Games. Today I will introduce you to the UI optimization experience of UE4. We have received some complaints from domestic developers before. They feel that after opening the UI on the mobile phone, the performance declines quickly. Today I will give you a special introduction on how to do UI optimization on UE4. The UI optimization method introduced in this article is not only applicable to mobile platforms, but also has a great performance improvement for complex UI systems on other platforms (such as PCs and hosts).
Article directory:
1 Basic concepts of UI
1.1 Explanation of terms
1.2 Rendering process
1.3 Performance index
2 Optimization plan
2.1 Game thread optimization
2.1.1 Invalidation Box
2.1.2 Visibility (Widget Visibility)
2.1.3 Widget Binding
2.2 Rendering thread optimization
2.2.1 Merging batches
2.2.2 Retainer Box
2.2.3 Event-driven Retainer Box
2.2.4 Switching materials
2.3 Other optimizations
2.3.1 C ++ development
2.3.2 Manager Class
2.3.3 Free texture memory
2.3.4 3D RTT optimization
2.3.5 New features
3 Effect test
4 Summary ... 34
1. The basic concept of UI
1.1 Explanation of terms
User Widget: corresponds to a user interface.
Widget Tree: Each User Widget is stored in a tree structure.
Panel Widget: It will not be rendered and used to lay out Child Widget, such as Canva Panel, Grid Panel, Horizontal Box, etc.
Common Widget: used for rendering, will be generated into the final Draw Elements, such as Button, Image, Text, etc.
1.2 Rendering process
Schematic diagram of the basic rendering process:
In the game thread (Game Thread), Slate Tick will traverse the Widget Tree twice per frame.
Prepass: Traverse the tree from bottom to top to calculate the ideal size (Desired Size) of each Widget.
OnPaint: Traverse the tree from top to bottom to calculate the Draw Elements required for rendering. In this process, the corresponding Vertex Buffer will be generated according to the type and parameters of the Common Widget, the Render Transform of the Common Widget will be calculated into the Vertex Buffer, and the batch merge will be performed according to the information such as Layer ID and Material. The last User Widget will generate one or more Draw Elements and pass the Draw Elements to the rendering thread for rendering, where each Draw Element corresponds to a Draw Call.
In the Render Thread, Slate rendering is divided into two steps:
Widget Render: Perform RTT of UI. If Retainer Box is used, Draw Elements will be rendered to Rtain Target of Retainer Box.
Slate Render: Render Draw Elements to Back Buffer. If Retainer Box is used, the Texture Resource corresponding to Retainer Box will be rendered to Back Buffer.
1.3 Performance index
The Stat.Slate command lists some main Slate performance parameters:
Num Painted Widgets: The number of Widgets that execute OnPaint on the game thread.
Num Batches: The number of Draw Element (that is, Draw Call).
Stat.Slate will create an unoptimized UI, and the statistics thread will calculate the performance data of this UI into Slate overhead, so the time data in the table is very different from the real data. It is recommended to view the time overhead of counting thread variables through the following command:
stat dumpave–num = 120 –ms = 0.5
The statistics of the three key indicators are:
Slate Tick: Statistics thread variable STAT_SlateTickTime.
Slate Render: Statistics thread variable STAT_SlateRenderingRTTime.
Widget Render: Statistics thread variable FWidgetRenderer_DrawWindow.
Game thread code:
Statistics thread code:
Debug panel effect:
2 Optimization plan
2.1 Game thread optimization
2.1.1 Invalidation Box
Use Invalidation Box to encapsulate User Widget to cache Slate Tick data without calculating every frame. The operation method is as follows:
All Prepass and OnPaint calculation results under the Invalidation Box will be cached. If the rendering information of a Child Widget changes, the Invalidation Box will be notified to recalculate the Prepass and OnPaint to update the cache information.
The figure below demonstrates a special case where the hero icon is a reusable User Widget, each of which is encapsulated into an Invalidation Box. The entire hero list is a Scroll Box. When the Scroll Box slides up and down, the Transform information corresponding to the User Widget of the hero icon will also change.
At this time, you can check the Cache Relative Transforms corresponding to the Invalidation Box, as shown below:
Then when the position of the User Widget changes, the engine will not update all the Draw Elements (that is, Vertex Buffer), but will modify the Shader parameters (View * Projection Matrix) to reflect the position change. This method is only applicable to position changes. If the zoom changes, the Draw Element still needs to be recalculated. Cache Relative Transforms will add a small amount of additional calculations to Game Thread to ensure that it is checked only when it is needed.
When the rendering information of a widget changes, it will notify the Invalidation Box where it is located to re-cache Vertex Buffer. In a complex User Widget, Invalidation Box frequently caches the entire Widget Tree will bring high performance overhead, there are two ways to solve this problem.
The first method is to split the Invalidation Box, and split them into different Invalidation Boxes according to whether the Widget changes frequently.
Sometimes due to layout reasons, it is not very convenient to divide different Invalidation Boxes, then you can use the second method to set the Widget to Is Volatile, so that the upper Invalidation Box will exclude this Widget when it is cached. Tick will calculate Prepass and OnPaint, but the overall Widget Tree cache will not be affected.
The LevelUpIcon in the above picture is usually hidden, and will be displayed when the character is upgraded. LevelUpAnim realizes the animation effect by changing the position of the Widget. When rendering this Image, because the position has been changing, it will cause the Invalidation Box to recalculate the cache of the entire Widget Tree every frame, and the performance is relatively low. This Widget can be set to Is Volatile to improve performance.
The Is Volatile option in the editor can be used to explicitly set Volatile to improve the performance of the Invalidation Box. Sometimes the Widget Binding will implicitly mark the Widget as Volatile, causing this Widget to tick every frame, which reduces performance.
Each Widget lists in the ComputeVolatility function which attributes will affect the Draw Element (Vertex Buffer).
Text Widget affects the properties of Draw Element:
Progress bar Widget affects the properties of Draw Element
If you use Widget Binding on the attributes that affect the Draw Element, it will cause the engine to Tick every frame to check whether the attributes have changed, so as to determine whether the Draw Element needs to be updated, so you should avoid using Widget Binding.
You can use Slate.InvalidationDebugging to check whether the Invalidation Box and Volatile are set correctly.
Green wireframe: Widget using Invalidation Box cache.
Blue wire frame: Invalidation Box Cache Relative Transforms is checked.
Dotted box: Widget labeled Volatile.
Red wireframe: Widget that does not use Invalidation Box.
The Slate.AlwaysInvalidate command can force the Invalidation Box to update the cache every frame, and can be used to test whether it will cause a sudden stuck. If a User Widget is too complicated, you can split it into multiple Invalidation Boxes, and put the Widget into different Invalidtion Boxes according to the update frequency.
2.1.2 Visibility (Widget Visibility)
There are 5 types of widget visibility:
Visible: visible and clickable
HitTestInvisible:
SelfHitTestInvisible: visible, the current Widget is not clickable, and does not affect the Child Widget
Hidden: Invisible, occupying layout space
Collapsed: Invisible, does not occupy layout space
The default property of many widgets is Visible, which needs to be manually set to HitTestInvisible and SelfHitTestInvisible. If a large number of widgets are set to Visible, the efficiency of the engine when clicking on the response will be greatly reduced, which will also increase the overhead of the game thread.
Collapsed does not occupy layout space (Layout Space), so it will not perform Prepass calculation after hiding, and its performance is better than Hidden.
You can use Widget Reflector to help check if there is an incorrectly set Visibility property.
2.1.3 Widget Binding
When analyzing Volatile, it was mentioned that Widget Binding will cause Volatile to reduce UI performance. In addition, Widget Binding is executed every frame Tick, the performance is relatively low. It is not recommended to use this function in the project, it is recommended to pass values by calling functions in C ++ (or blueprint).
RemoveFromViewport / AddToViewport will destroy and rebuild the User Widget, using Collapsed / SelfHitTestInvisible can get better performance.
In addition, it is recommended to move the complex calculation logic in Blueprint Tick to C ++ on the mobile platform.
2.2 Rendering thread optimization
2.2.1 Merging batches
With the development of GPUs, the impact of the number of Draw Calls on performance is getting smaller and smaller. In many cases, reducing Draw Calls does not bring about an increase in FPS. But reducing Draw Call can reduce API calls to the GPU, which helps to control mobile phone fever on the mobile side.
A.Panel Widget
In engine versions prior to 4.15, Canvas Panel does not support batch merging. It is recommended not to use Canvas Panel. Try to use Grid Panel, Vertical Box, Horizontal Box, and other containers that support merging batches.
4.15 Added support for merged batches of Canvas Panel. The opening method is located in Project Settings: "Engine-> Slate Settings-> Constraint Canvas-> Explicit Canvas Child ZOrder". Then you can set the ZOrder property of the Child Widget of Canvas Panel. Batches with the same ZOrder (the same rendering parameters) will merge the batches. Compared to Grid Panel and Horizontal Box, Canvas Panel has no additional layout calculations, and OnPaint efficiency is slightly higher. (Game thread).
B. Merge textures
Sprite in UE4 conveniently supports the editing and use of merged textures.
If you need to switch between independent textures and merged textures in the logic code, in Manager Class, initialize the independent textures (UTexture2D) and merged texture resources (UPaperSprite), and create FSlateBrush, set the resource to FSlateBrush through SetResourceObject. Then you can control the parameters passed into UImage :: SetBrush through the switch variable.
Later in the project, if you need to replace all the textures in the User Widget with merged textures, it is a very tedious job. Dmitriy Dyomin of Epic Games provides an idea for easy and quick replacement.
First implement a Commandlet:
The commandlet can be run using the following command:
The specific functions of the Commandlet: traverse all Widget Blueprint Assets, use AssetRegistry to load Assets, and check the Texture used by UImage and UBorder, and determine whether there is a corresponding Sprite Asset according to the naming rules. Use AssetRegistry to replace Texture with Sprite, and finally save the Widget Blueprint Asset.
2.2.2 Retainer Box
By merging batches and merging textures, the number of Draw Calls in the UI may be reduced to a relatively low level, but there will still be a high pixel fill rate.
In many cases, the UI does not need to be rendered every frame, so the rendering results can be cached through the Retainer Box and updated every few frames. The principle of the Retainer Box is to cache the UI rendering on the Render Target, and then render the Render Target to the screen.
In the figure below, we divide the UI of the main interface into 4 Retainer Boxes and render them by updating them every 3 frames.
The Retainer Box area should be as small as possible to help improve rendering efficiency and reduce video memory usage. Usually Retainer Box should contain the background image of User Widget, because the background image has a large pixel fill rate.
Retainer Box will create a Render Target for each User Widget instance, so without changing the code, the reused User Widget should not use Retainer Box. For example, in the figure below, we should create a Retainer Box for the User Widget where the Scroll Box is located, and not for the User Widget where the Scroll Box Item is located.
The following figure demonstrates another situation. The User Widget B_HeroIcon is repeatedly used in multiple main interfaces such as HEROS and SOCIAL. Battle Breakers is a UI-heavy mobile game, so it is difficult to assign Retainer Boxes to all the main interface, which will occupy a lot of video memory. Of course, we do not want to create a Retainer Box for each B_HeroIcon.
At this time, a better Retainer Box effect can be achieved by extending the code. Assuming that we know that the upper limit of the B_HeroIcon appearing on the screen at the same time is 20, then we can create a Render Target Pool containing 20 Render Targets to make different Retainer Boxes. Can share the same Render Target.
Retainer Box will consume additional video memory, so it is necessary to control the amount of usage and give it priority to the User Widget with the greatest performance improvement. One case is the User Widget on the main interface, and the other case is a large number of frequently used User Widgets after sharing the Render Target.
Using Retainer Box not only improves the efficiency of the rendering thread, the game thread's Tick will be executed once every few frames accordingly. If the Retainer Box contains clickable widgets, you need to set the Retainer Box to Visible so that the engine will map the click test area to the Retainer Box.
The effects of continuous expression (such as 3D characters, material effects) can be separated from the Retainer Box, but need to pay attention to the pixel fill rate, it can also be solved from the aspect of special effects design.
It doesn't make sense to place the Invalidation Box above the Retainer Box. The usual approach is to put an Invalidation Box under the Retainer Box.
The Phase Count of the Retainer Box needs to be considered globally. For example, the following figure shows that the Retainer Box is updated every 3 frames and updated at frame 0:
The following figure shows that it is updated every 5 frames and updated at the second frame:
Then every 15 frames, these two Retainer Boxes will be updated at the same time within a frame, resulting in a decrease in the number of frames.
2.2.3 Event-driven Retainer Box
At present, the Retainer Box needs to be specified to be updated every few frames, but in some cases, the User Widget does not need to be updated at a fixed frequency, and will only be updated when the user operates (and the operation is not frequent). In this case, you can extend the Retainer Box to support the event-driven approach.
The implementation idea is to inherit URetainerBox and SRetainerWidget, and determine in PaintRetainedContent (the function name before 4.16 is OnTickRetainers) whether an event triggers an update. If the update is required, call the parent class's PaintRetainedContent, otherwise return.
2.2.4 Switching materials
UE4 provides a wealth of material effects. On low-end machines, you can consider turning off these effects or switching to low-quality materials to improve performance.
You can use the DYNAMIC_MULTICAST framework provided by the engine to bind all affected widgets to a switch variable to achieve overall switching.
2.3 Other optimizations
2.3.1 C ++ development
Except for the design reason of UI animation, this storage structure cannot be implemented in C ++, and other UI functions can be implemented in C ++.
The first step is to implement a C ++ class UWExpHeroIcon inherited from UUserWidget
The second step is to use Reparent Blueprint to modify the parent class to UWExpHeroIcon
The third step is to find the variables and types that need to be exposed in the editor
The fourth step is to declare the BindWidget variable in C ++, the engine will automatically associate the data
2.3.2 Manager Class
It is recommended to create a Manager Class in the project to manage all User Widgets and all UI resources, such as Brush and Font. Manager Class can be in C ++ or blueprint form.
2.3.3 Free texture memory
One premise for releasing texture memory is not to set textures (Image item in the figure below) during editing, but to manually load textures, set textures, and destroy textures through the program. If you don't set the texture in the editor, you can avoid referring to this texture object in CDO (Class Default Object). The reference of CDO will make the reference count of SharedPtr at least 1, and will not be destroyed before exiting the application.
If the Image property is set in the Editor and you want to destroy the texture, Wang Mi of Epic Games provides a way to remove the reference relationship between UImage and UTexture during the Cook stage, so that the CDO of this User Widget will not reference UTexture.
The code to remove the reference relationship in the Cook phase is as follows:
The code to load the texture is as follows:
The code to release the texture is as follows:
2.3.4 3D RTT optimization
By default, SceneCaptureComponent2D is ticked every frame, and you can usually cancel updating the image every frame:
The update frequency of the animation is sufficient 30 times per second on the phone, so you can set the Tick interval setting of SceneCaptureComponent2D through the blueprint:
Then manually call Capture in the blueprint:
In addition, the size of the Render Target of SceneCaptureComponent2D should not be too large, which helps to improve performance.
2.3.5 New features
We have added two debug commands to Battle Breakers, which may be merged into the trunk in version 4.17. game interface:
Use Slate.ShowOverdraw to view Pixel Overdraw:
Use Slate.ShowBatching to view the batch:
3 Effect test
We made a test project to test the optimization effect. The UI in the picture below has more than 800 widgets:
The test machine is a thousand yuan machine, and the machine parameters are as follows:
After opening the Invalidation Box, the Slate Tick time was greatly reduced. Since the application opened Mobile HDR, the bottleneck was on the GPU, so the FPS did not improve much.
The following figure can easily compare the performance parameters of Invalidation Box, Retainer Box, and event-driven Retainer Box after they are turned on (you can see that the improvement of the rendering thread greatly improves the FPS):
4 Summary
Most of the UI optimization work (such as Invalidation Box, Retainer Box) is carried out later in the project (after the basic UI development is completed). UE4 provides a wealth of functions and debugging tools, and mastering these functions can help developers achieve high-performance UI.