Graphics system in "original" Linux environment and AMD R600 graphics Programming (4)--AMD graphics memory management mechanism

Source: Internet
Author: User

The memory of the video card is divided into two parts, part is the video card comes with the memory called VRAM RAM, and the other part is the system main memory is called GTT RAM (the graphics translation table and the gart behind the same meaning, refers to the Graphics Card page table, GTT Memory can be understood as the need to build a GPU page table memory). On embedded systems or integrated graphics cards, the graphics card usually does not have video memory, but rather fully uses the system memory. Often, the memory on the video card is several times faster than the system memory, so much of the data will be significantly faster than using system memory (such as textures, common textures and resident textures in OpenGL) if it is placed on the video card's own video memory.

Some of the content must be placed in VRAM, such as the "frame cache" that is eventually used for display, and the later page table Gart (Graphics addres remapping table), and some, for example, the command ring buffers that will be described later (ring Buffer) is to be placed in the GTT memory. On the other hand, VRAM memory is limited, and if VRAM memory is used up, some data must be put into GTT memory.

Usually GTT memory is on demand, and is used for devices, such as Radeon R600 graphics card can use up to 512M of system memory (this is set in the Linux kernel), a one-time allocation of 512M continuous memory for the device is not possible to succeed in the Linux system, And even if it succeeds, there is quite a lot of memory that can be wasted. According to the principle of on demand, how much is used to allocate from the system memory, so that the resulting GTT within the memory is definitely not contiguous. The GPU also needs to use VRAM memory and GTT memory, the simplest way is to address these two pieces of memory uniformly (this is similar to the RISC machine on the IO and Mem Unified address), VRAM is the memory of the video card, its addresses must be continuous, but the non-contiguous GTT memory if you want a unified addressing, It is necessary to establish a mapping relationship through a page table, which is called GTT or Gart, which is why these memory is called GTT memory.

Similar to the CPU-side address, we use the GPU address called "GPU virtual address", after the page table is called "GPU Physical Address", these addresses are the GPU is ultimately used to address the memory, because the GPU is hooked up on the device bus, so here the "GPU Physical Address" is "bus address", Of course, the memory in the VRAM area is not built into the page table, the address of the memory area we only care about its "GPU virtual address."

R600 graphics core memory management registers as shown in table 1, there is no complete description of these registers, the data in the table is obtained according to the reading code.

Register name Offset address Function
R600_config_memsize 0x5428 VRAM size
Mc_vm_fb_location 0x2180 The start address and length of the VRAM zone in the GPU virtual address space
Mc_vm_system_aperture_low_addr 0x2190 The start address of the VRAM zone in the GPU virtual address space
Mc_vm_system_aperture_high_addr 0x2194 VRAM area at the end address of the GPU virtual address space
Vm_l2_cntl GPU L2 Cache Control Register
Mc_vm_l1_tlb_mcb GPU TLB Control Register


0x1594 GTT the start address of the memory
Vm_context0_page_table_end_addr 0x15b4 GTT End Address of memory
Vm_context0_page_table_base_addr 0x1547 GPU Page Table Base Address
Vm_context0_cntl 0x1440 GPU Virtual address space enable register
Vm_context0_protection_fault_default_addr 0x1554 Page Fault handler address

Table 1

In the Radeon graphics card, VRAM memory involves "visiable VRAM" and "real VRAM" two statements, visiable VRAM is can use the PCI device memory mapping method of memory, this part of memory is available for software access, And the VRAM of the graphics card is still not visible, not directly accessible by the software (the GPU itself is used?). ), this part of memory plus visiable ram together form the real VRAM of the graphics card.

By reading the PCI configuration space can get to visiable VRAM, such as on a machine read visiable RAM size 256M, read radeon_config_memsize get real VRAM size 512M, So the VRAM length is 512M, the VRAM start address is set to 0x0, then the end address is 0X1FFFFFFF, then the start and end addresses are written to the R_000004_mc_fb_location register:

Rv515_mc_wreg (R_000004_mc_fb_location, S_000004_mc_fb_start (Rdev->vram_start >> 16) | S_000004_mc_fb_top (Rdev->vram_end >> 16));

The GTT memory and Gart are then set. The size of the GTT is determined by the driver itself, and the GTT size is determined, and the memory that Gart occupies is determined. Refer to the kernel source code and the above table to give the instructions should be easy to understand the process.

Compared to the 3-level page table used by the CPU, the Radeon GPU uses a simple page table, the Radeon GPU uses a 1-level page table (configurable), and the page table size is 4K, then the 12-bit (212=4k) of the page table entry is the flag bit. In the earlier Radeon GPU, the GPU used a Page Table page table entry that was 32-bit, the 64-bit GPU page table entry after the R600, and the 12-bit flag bit in the page table entry with only the last 6 bits useful, which defines 1.

Figure 1

The GPU page table is in GPU VRAM memory, and the VM_CONTEXT0_PAGE_TABLE_BASE_ADDR and vm_context0_page_table_end_addr two registers indicate the position of the page table in VRAM.

The code in the XXXX Linux kernel is "pending modification"

The above function has two parameters, DMA\_ADDR is the allocated system memory mapped bus address, this address for the device to access the main memory, which is also referred to as the "GPU Physical Address", the following parameter index is the page table item indexes.

In the code, PTR is the address of the memory in the CPU virtual address space where the page table resides, the page table entry for R600 is 64 bits, and the page table for R500 and below is 32 bits.

Here's a look at the allocation and mapping of a piece of memory. In the next blog post, a memory is allocated using a ring buffer, which is used to place commands, and the CPU places commands in this memory, and the GPU configures the GPU from this piece of memory.

XXX ring_init Process description "pending modification"

After GPU initialization is complete, the R600 graphics GPU looks like this in Figure 2 (code), is there an error? ) shows the process for memory access.

Figure 2

  If it is GTT memory, you need to look up the GPU page table, based on the 64-bit address (in the current drive is actually only 32 bits) of the first 50-bit positioning GPU page table entries, according to the page table entry content of the next 12 bits and 0 is the presence of the PCI device space in the "page base", "page base" plus The last 12 bits of the bit address (in-page offset) get the corresponding bus address.

Note that because VRAM and GTT are unified addressing, and VRAM does not participate in the page table address translation process here, there is a need to subtract GTT memory base addresses.

In the Linux kernel there is a complete set of memory management mechanisms, this mechanism is TTM and gem (related reference). As with the system memory management inside the operating system, this mechanism is more complex, we do not describe the implementation of this mechanism in detail, but simply describe how to acquire and use memory outside the kernel core.

Kernel uses video memory

The following code is in the Radeon kernel driver code radeon_device_init (drivers/gpu/drm/radeon/radeon_device.c) function:

810 if (radeon_testing) {

811 radeon_test_moves (Rdev);


810 line is a global variable switch, when the switch is turned on, the driver will do a copy screen operation, this code in the Drivers/gpu/drm/radeon/radeon_test.c file, radeon_test_moves do some data copy operation, This includes a copy of the data from VRAM to the system main memory and the system main memory to VRAM, which can be seen on the screen when the system is booted (this is a command process that can be run directly in the Radeon kernel driver code and see the effect). In this place, the kernel has already done the initialization work, the next part of the graphics card programming can be placed in this place, re-system can see the effect. Here is a sample code that uses the kernel API for memory allocation and manipulation:

1 struct Radeon_bo *vram_obj = NULL;

2 struct Radeon_bo *gtt_obj = NULL;

3 uint64_t vram_addr, gtt_addr;

4 unsigned size;

5 void *vram_map, *gtt_map;


7 size = 1024 * 768 * 4;

8 R = radeon_bo_create (Rdev, size, page_size, True,

9 Radeon_gem_domain_vram, &vram_obj);

Ten if (r) {

Drm_error ("Failed to create VRAM object\n");

Goto Out_cleanup;


+ R = Radeon_bo_reserve (Vram_obj, false);

if (unlikely (r! = 0))

-Goto Out_cleanup;

+ R = Radeon_bo_pin (Vram_obj, Radeon_gem_domain_vram, &vram_addr);

if (r) {


Drm_error ("Failed to pin VRAM object\n");

Goto Out_cleanup;


Over R = Radeon_bo_kmap (Vram_obj, &vram_map);

if (r) {

Drm_error ("Failed to map VRAM object\n");

-Goto Out_cleanup;



The R = radeon_bo_create (Rdev, size, page_size, True,


if (r) {

Drm_error ("Failed to create GTT object\n");

Goto Out_cleanup;


R = Radeon_bo_reserve (Gtt_obj, false);

if (unlikely (r! = 0))

Panax Notoginseng Goto Out_cleanup;

-r = Radeon_bo_pin (Gtt_obj, RADEON_GEM_DOMAIN_GTT, &gtt_addr);

if (r) {

Drm_error ("Failed to pin GTT object\n");

Out_cleanup Goto;


R = Radeon_bo_kmap (Gtt_obj, &gtt_map);

if (r) {

Drm_error ("Failed to map GTT object\n");

Out_cleanup Goto;




if (vram_obj) {

Wuyi if (radeon_bo_is_reserved (vram_obj)) {

Radeon_bo_unpin (Vram_obj);

Radeon_bo_unreserve (Vram_obj);


Radeon_bo_unref (&vram_obj);


if (gtt_obj) {

if (radeon_bo_is_reserved (gtt_obj)) {

Radeon_bo_unpin (Gtt_obj);

Radeon_bo_unreserve (Gtt_obj);


Radeon_bo_unref (&gtt_obj);


The code above shows the process of creating two buffer object (BO), allocating memory space separately from VRAM and GTT memory, and eventually freeing up memory space and Bo. The Buffer object is the basic structure of the graphics card to the memory management, is the abstraction of a piece of RAM, the Radeon graphics driver uses the RADEON_BO structure to manage and describe a piece of video memory.

1-2 lines, here we have two Bo objects (allocated two video memory), one piece of memory from VRAM, another piece from GTT memory.

8 lines, create and initialize a Bo, allocate video memory. The parameters are as follows:

    • Rdev,radeon_device structure Pointer;
    • Size of the BO;
    • True, the request from the kernel or user space, if it is the kernel, the allocation Bo structure process is non-interruptible, and from the user space and kernel space access to this video memory when the virtual address and the physical address between the mapping relationship is different;
    • Radeon_gem_domain_vram, the video memory is located in VRAM or GTT, the RADEON driver defines 3 types of video memory Radeon_gem_domain_cpu (0x1), Radeon_gem_domain_ GTT (0x2), define Radeon_gem_domain_vram (0x4), RADEON_GEM_DOMAIN_CPU is not clear for what purpose, the following two means memory from GTT memory and VRAM;
    • Vram_obj,bo pointer, returns the BO structure.

14 Line, reserve (reserved) Bo, (indicates that the current Bo is already in use, does not allow other code to use?? )。 If Bo is already in reserve, it will not be available until Bo is unreserve.

17 lines, get the GPU virtual address of the memory represented by Bo, the GPU will use this address to access the memory, and then we let the GPU to use this type of address.

23 Line, the mapping Bo represents the memory space, the second parameter of the function returns the mapped CPU virtual address, the driver will use this to access the memory.

29-47 lines of code and the same principle, the difference is that the memory from the GTT memory, the API functions within the processing of the difference will be relatively large, but the use of the API only the memory type of the parameter is different.

50-56 lines free memory and Bo structure.

Use of memory outside the core

User space acquires video memory through LIBDRM. The following code shows how to obtain and use video memory outside of the kernel:

1 int ret;

2 struct Kms_bo *bo;

3 unsigned bo_attribs[] = {

4 Kms_width, 0,

5 kms_height, 0,

6 Kms_bo_type, Kms_bo_type_scanout_x8r8g8b8,

7 Kms_terminate_prop_list


9 bo_attribs[1] = width;

Ten bo_attribs[3] = height;

One ret = kms_bo_create (KMS, bo_attribs, &BO);

if (ret) {

fprintf (stderr, "failed to Alloc buffer:%s\n", Strerror (-ret));

return NULL;


+ ret = Kms_bo_get_prop (bo, Kms_pitch, Stride);

if (ret) {

fprintf (stderr, "failed to retreive buffer stride:%s\n", Strerror (-ret));

Kms_bo_destroy (&BO);

return NULL;


ret = Kms_bo_map (bo, &virtual);

if (ret) {

fprintf (stderr, "failed to map buffer:%s\n", Strerror (-ret));

Kms_bo_destroy (&BO);

+ Return NULL;


return Bo;

This code is similar to the code in the kernel, and the reader should be able to understand its meaning based on the function name of the called function. To write a complete program, you can refer to the LIBDRM source code or the sample included here.

Graphics system in "original" Linux environment and AMD R600 graphics Programming (4)--AMD graphics memory management mechanism

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.