Dual buffer and single Buffer

Source: Internet
Author: User
In Linux, the UI is displayed through framebuffer. Framebuffer is actually a continuous physical memory reserved by the GPU in the embedded system. The led reads data from framebuffer through a dedicated bus and displays the data on the screen.
The number of framebuffers in the system can be divided into two types: single buffer and dual buffer.
Let's talk about a single Buffer:
The CPU writes data to framebuffer, And the LED reads data from framebuffer. This is a two-phase process at the same time. It must be used in time; otherwise, problems may occur.
If the CPU writes speed to framebuffer> led reads speed from framebuffer, it is possible that when the LED reads data from the previous screen in one row, the CPU has refreshed the entire screen, causing confusion. Note that the LED reading speed from framebuffer is not equal to the screen refresh frequency. If the refresh frequency is 60Hz, it is likely that the LED will spend 3 ms to read, the remaining time is waiting. It should be said that the speed of CPU writing to framebuffer> the speed of LED reading from framebuffer is still very difficult.
If the CPU writes too slowly to framebuffer, the screen will flash. For example, to draw a picture, the CPU first fills it in white, then the LED is refreshed, the screen is displayed as white, and then other content is painted, the screen is displayed normally. In this case, the screen is flashing. This requires the CPU to finish drawing a screen as soon as possible, and try to ensure that writing a screen does not span the LED refresh cycle.
Therefore, in the era of single framebuffer, in order to prevent screen flickering, we generally open a piece of content in the memory of the same size as the framebuffer, write all the content on the screen, then execute a memory copy. So that the write time of framebuffer is as short as possible.
However, the screen resolution is 320*240.
The size of a framebuffer is 320*240*4 = 0.3072 M.
That is to say, I need to first fill in the memory of M, and then copy this memory to the framebuffer. For simplicity, here is an example of setting the screen to white to eliminate the impact of Screen Content computing.
The actual display process is to set 0.3m to 0 in the memory, read the 0.3m memory, and copy it to framebuffer.
On the embedded platform I used, I used the SDRAM and 532mhz arm11 chips. By using lmbench, the memory access rate is as follows:
* Local * Communication bandwidths in MB/S-bigger is better
-----------------------------------------------------------
Host OS pipe af TCP file MMAP bcopy mem
UNIX reread (libc) (hand) read
Write
---------------------------------------------------------------
Phone Linux 2.6 79.1 94.7 13.3 47.1 135.9 78.6 77.8 135. 205.9
It is strange that reading is slower than writing, and may be related to the cache. You don't need to worry about it here.
Next I will calculate the time used by a single buffer.
Set 0.3m memory to 0: 0. 3072 × 1000/200 = 1.5 ms.
Read 0.3072 MB memory: 1000/135x2.7 = Ms
Write 03m memory to framebuffer: 0: 0.3072 × 1000/200 = 1.5 ms.
The total time is 5.7 Ms. The actual transmission condition is more than this, because it also involves loop overhead.
5.7ms looks okay.
When the screen resolution increases, the situation is different. If the screen resolution is increased to 800*600, the data size will be 320 times that of 240*6.25. The time for displaying a white screen is 5.7*6.25 = 36.6 Ms. That is to say, if only the white screen is displayed, We can display up to 30 frames. This is too long.
And if we only calculate the time from the memory copy to framebuffer
(2.7 + 1.5) * 6.25 = 25.25 Ms. If the LED refresh frequency is 60Hz, the time for refresh a screen is 18 ms, so that even if you simply copy the data from the memory to framebuffer, the time for writing a screen is greater than the LED refresh frequency, which will surely cause the screen to flash.
Another problem with a single buffer is that it needs to copy memory every time, so that it will refresh the data cache in the CPU, which has a negative impact on the performance of the software.

Performance improvement:
From the software perspective, try to reduce the refresh area every time and reduce the memory transmission.
From the hardware point of view, a higher CPU and faster DDR memory are used to improve the memory throughput.

Next let's talk about double buffer.
The advantage of dual buffer is that when the LED shows the current framebuffer, the software can write directly to the background framebuffer. After writing, only two buffers can be switched and no memory copy is required, to improve efficiency.
For example, to display a 320x240 white screen, the process becomes:
1. Write directly to the back frame buffer, which takes 0.3072 × 1000/200 = 1.5 ms.
2. After writing, framebffer switches. This time is negligible.
When the screen changes to 800x600, the time spent is 1.5x6.25 = 9.375, Which is 2.6 times more efficient than a single buffer.
In addition, dual buffer also has a benefit. because it does not have the memory copy link, it has a small impact on the data cache and is advantageous for improving the software performance. In addition, it is necessary to set the data bit of framebuffer through the cache. Because the data is no longer used after it is written, it is more efficient to use the write-through cache than to use the write-back cache.

Another advantage of dual buffer is that GPU Hard acceleration can be used. To use GPU Hard acceleration, the displayed data must be in the reserved physical memory, which cannot be implemented in a single buffer mechanism.

Dual buffer also has a fatal weakness: Local refreshing.
In the case of a single buffer, it is easy to implement local refresh. You only need to copy the local data to the corresponding location. For dual-buffer, If You Want To implement partial refresh, you must first copy the currently displayed buffer data to the subsequent buffer, and then perform partial refresh. In this way, the efficiency of refreshing the whole screen is higher for dual buffer devices than that of partial refreshing.

Ideal for partial refreshing:
Write the locally modified data directly to another video memory and overlay it with the previous buffer.

It should be said that single buffer and double buffer have their own expertise, and the key lies in how to use them flexibly.

Write
Through), also known as write-through, that is, when the CPU writes data to the cache, it also writes the data to the primary storage to ensure the consistency of the corresponding unit data in the cache and the primary storage. This feature is simple and reliable, however, because the CPU needs to write data to the primary memory each time it is updated, the speed will inevitably be affected.
Write
Back), that is, the CPU only writes data to the cache, and uses the mark to indicate that the data block written in the cache will be replaced by the information block to be entered. This method takes into account that the write is usually an intermediate result, and the write speed is slow and unnecessary. It features high speed and avoids unnecessary redundant write operations, but the structure is complicated.
The direct write and back-to-write operations are quite different. In different scenarios, different memory blocks use different back-to-write policies (if your system can implement them) it is much more efficient than using a policy. Specifically, the memory block that is repeatedly accessed is set to write back, and the memory used after a write operation for a long time is set to write-through, which can greatly improve the cache efficiency.
The first point is easy to understand, and the second point needs to be considered. Because the write-through operation is that when data with this address is cached, the cache and primary storage are updated at the same time, when the cache does not have the address data, it is directly written to the primary storage, ignoring the cache. When the data of this address is used for a long time, the data is definitely not in the cache (replaced), so it is better to directly write the data to the primary storage;
If you use the write-back operation
The cache contains the address data. You need to update the data and set the dirty bit. After a long time, you can use the data or replace it before flushing it to the primary storage, in this way, the white space occupies the valuable cache.
When the cache does not have the address data, it is worse. First, you need
Line) imports the cache, updates the data, sets the dirty bit, and waits for the memory to be flushed back. This not only occupies the cache space, but also imports data from the main memory, it also occupies the bus, causing high overhead. The reason why data needs to be imported from the primary storage is that when the cache writes data back to the primary storage
Line unit, but the updated data may not have a cache
There are so many lines, so to ensure data consistency, you must first import the data into the cache, update the data, and then click back.
For many video decoding operations, the frame write process is a one-time action and will only be used for the next reference frame. Therefore, the frame buffer memory can be set as a write-through operation, the next time you use it, it is likely to be used as a reference frame. As a reference frame, you do not need to access it repeatedly. You only need to perform one read operation, therefore, the efficiency will not be reduced because it does not pass through the cache. Experiments show that this method can make
MPEG4 SP decoding improves the efficiency by 20-30%.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.