Graphics system in "original" Linux environment and AMD R600 graphics Programming (4)--AMD graphics memory management mechanism

Last Update:2014-11-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The memory of the video card is divided into two parts, part is the video card comes with the memory called VRAM RAM, and the other part is the system main memory is called GTT RAM (the graphics translation table and the gart behind the same meaning, refers to the Graphics Card page table, GTT Memory can be understood as the need to build a GPU page table memory). On embedded systems or integrated graphics cards, the graphics card usually does not have video memory, but rather fully uses the system memory. Often, the memory on the video card is several times faster than the system memory, so much of the data will be significantly faster than using system memory (such as textures, common textures and resident textures in OpenGL) if it is placed on the video card's own video memory.

Some of the content must be placed in VRAM, such as the "frame cache" that is eventually used for display, and the later page table Gart (Graphics addres remapping table), and some, for example, the command ring buffers that will be described later (ring Buffer) is to be placed in the GTT memory. On the other hand, VRAM memory is limited, and if VRAM memory is used up, some data must be put into GTT memory.

Usually GTT memory is on demand, and is used for devices, such as Radeon R600 graphics card can use up to 512M of system memory (this is set in the Linux kernel), a one-time allocation of 512M continuous memory for the device is not possible to succeed in the Linux system, And even if it succeeds, there is quite a lot of memory that can be wasted. According to the principle of on demand, how much is used to allocate from the system memory, so that the resulting GTT within the memory is definitely not contiguous. The GPU also needs to use VRAM memory and GTT memory, the simplest way is to address these two pieces of memory uniformly (this is similar to the RISC machine on the IO and Mem Unified address), VRAM is the memory of the video card, its addresses must be continuous, but the non-contiguous GTT memory if you want a unified addressing, It is necessary to establish a mapping relationship through a page table, which is called GTT or Gart, which is why these memory is called GTT memory.

Similar to the CPU-side address, we use the GPU address called "GPU virtual address", after the page table is called "GPU Physical Address", these addresses are the GPU is ultimately used to address the memory, because the GPU is hooked up on the device bus, so here the "GPU Physical Address" is "bus address", Of course, the memory in the VRAM area is not built into the page table, the address of the memory area we only care about its "GPU virtual address."

R600 graphics core memory management registers as shown in table 1, there is no complete description of these registers, the data in the table is obtained according to the reading code.

Register name	Offset address	Function
R600_config_memsize	0x5428	VRAM size
Mc_vm_fb_location	0x2180	The start address and length of the VRAM zone in the GPU virtual address space
Mc_vm_system_aperture_low_addr	0x2190	The start address of the VRAM zone in the GPU virtual address space
Mc_vm_system_aperture_high_addr	0x2194	VRAM area at the end address of the GPU virtual address space
Vm_l2_cntl		GPU L2 Cache Control Register
Mc_vm_l1_tlb_mcb		GPU TLB Control Register
Vm_context0_page_table_start_addr	0x1594	GTT the start address of the memory
Vm_context0_page_table_end_addr	0x15b4	GTT End Address of memory
Vm_context0_page_table_base_addr	0x1547	GPU Page Table Base Address
Vm_context0_cntl	0x1440	GPU Virtual address space enable register
Vm_context0_protection_fault_default_addr	0x1554	Page Fault handler address
Radeon_pcie_tx_discard_rd_addr_lo/hi
Radeon_pcie_tx_gart_error

Table 1

In the Radeon graphics card, VRAM memory involves "visiable VRAM" and "real VRAM" two statements, visiable VRAM is can use the PCI device memory mapping method of memory, this part of memory is available for software access, And the VRAM of the graphics card is still not visible, not directly accessible by the software (the GPU itself is used?). ), this part of memory plus visiable ram together form the real VRAM of the graphics card.

By reading the PCI configuration space can get to visiable VRAM, such as on a machine read visiable RAM size 256M, read radeon_config_memsize get real VRAM size 512M, So the VRAM length is 512M, the VRAM start address is set to 0x0, then the end address is 0X1FFFFFFF, then the start and end addresses are written to the R_000004_mc_fb_location register:

Rv515_mc_wreg (R_000004_mc_fb_location, S_000004_mc_fb_start (Rdev->vram_start >> 16) | S_000004_mc_fb_top (Rdev->vram_end >> 16));

The GTT memory and Gart are then set. The size of the GTT is determined by the driver itself, and the GTT size is determined, and the memory that Gart occupies is determined. Refer to the kernel source code and the above table to give the instructions should be easy to understand the process.

Compared to the 3-level page table used by the CPU, the Radeon GPU uses a simple page table, the Radeon GPU uses a 1-level page table (configurable), and the page table size is 4K, then the 12-bit (212=4k) of the page table entry is the flag bit. In the earlier Radeon GPU, the GPU used a Page Table page table entry that was 32-bit, the 64-bit GPU page table entry after the R600, and the 12-bit flag bit in the page table entry with only the last 6 bits useful, which defines 1.

Figure 1

The GPU page table is in GPU VRAM memory, and the VM_CONTEXT0_PAGE_TABLE_BASE_ADDR and vm_context0_page_table_end_addr two registers indicate the position of the page table in VRAM.

The code in the XXXX Linux kernel is "pending modification"

The above function has two parameters, DMA\_ADDR is the allocated system memory mapped bus address, this address for the device to access the main memory, which is also referred to as the "GPU Physical Address", the following parameter index is the page table item indexes.

In the code, PTR is the address of the memory in the CPU virtual address space where the page table resides, the page table entry for R600 is 64 bits, and the page table for R500 and below is 32 bits.

Here's a look at the allocation and mapping of a piece of memory. In the next blog post, a memory is allocated using a ring buffer, which is used to place commands, and the CPU places commands in this memory, and the GPU configures the GPU from this piece of memory.

XXX ring_init Process description "pending modification"

After GPU initialization is complete, the R600 graphics GPU looks like this in Figure 2 (code), is there an error? ) shows the process for memory access.

Figure 2

　　If it is GTT memory, you need to look up the GPU page table, based on the 64-bit address (in the current drive is actually only 32 bits) of the first 50-bit positioning GPU page table entries, according to the page table entry content of the next 12 bits and 0 is the presence of the PCI device space in the "page base", "page base" plus The last 12 bits of the bit address (in-page offset) get the corresponding bus address.

Note that because VRAM and GTT are unified addressing, and VRAM does not participate in the page table address translation process here, there is a need to subtract GTT memory base addresses.

In the Linux kernel there is a complete set of memory management mechanisms, this mechanism is TTM and gem (related reference). As with the system memory management inside the operating system, this mechanism is more complex, we do not describe the implementation of this mechanism in detail, but simply describe how to acquire and use memory outside the kernel core.

Kernel uses video memory

The following code is in the Radeon kernel driver code radeon_device_init (drivers/gpu/drm/radeon/radeon_device.c) function:

810 if (radeon_testing) {

811 radeon_test_moves (Rdev);

812}

810 line is a global variable switch, when the switch is turned on, the driver will do a copy screen operation, this code in the Drivers/gpu/drm/radeon/radeon_test.c file, radeon_test_moves do some data copy operation, This includes a copy of the data from VRAM to the system main memory and the system main memory to VRAM, which can be seen on the screen when the system is booted (this is a command process that can be run directly in the Radeon kernel driver code and see the effect). In this place, the kernel has already done the initialization work, the next part of the graphics card programming can be placed in this place, re-system can see the effect. Here is a sample code that uses the kernel API for memory allocation and manipulation:

1 struct Radeon_bo *vram_obj = NULL;

2 struct Radeon_bo *gtt_obj = NULL;

3 uint64_t vram_addr, gtt_addr;

4 unsigned size;

5 void *vram_map, *gtt_map;

7 size = 1024 * 768 * 4;

8 R = radeon_bo_create (Rdev, size, page_size, True,

9 Radeon_gem_domain_vram, &vram_obj);

Ten if (r) {

Drm_error ("Failed to create VRAM object\n");

Goto Out_cleanup;

13}

+ R = Radeon_bo_reserve (Vram_obj, false);

if (unlikely (r! = 0))

-Goto Out_cleanup;

+ R = Radeon_bo_pin (Vram_obj, Radeon_gem_domain_vram, &vram_addr);

if (r) {

Drm_error ("Failed to pin VRAM object\n");

Goto Out_cleanup;

22}

Over R = Radeon_bo_kmap (Vram_obj, &vram_map);

if (r) {

Drm_error ("Failed to map VRAM object\n");

-Goto Out_cleanup;

27}

The R = radeon_bo_create (Rdev, size, page_size, True,

RADEON_GEM_DOMAIN_GTT, &gtt_obj);

if (r) {

Drm_error ("Failed to create GTT object\n");

Goto Out_cleanup;

34}

R = Radeon_bo_reserve (Gtt_obj, false);

if (unlikely (r! = 0))

Panax Notoginseng Goto Out_cleanup;

-r = Radeon_bo_pin (Gtt_obj, RADEON_GEM_DOMAIN_GTT, &gtt_addr);

if (r) {

Drm_error ("Failed to pin GTT object\n");

Out_cleanup Goto;

42}

R = Radeon_bo_kmap (Gtt_obj, &gtt_map);

if (r) {

Drm_error ("Failed to map GTT object\n");

Out_cleanup Goto;

47}

Out_cleanup:

if (vram_obj) {

Wuyi if (radeon_bo_is_reserved (vram_obj)) {

Radeon_bo_unpin (Vram_obj);

Radeon_bo_unreserve (Vram_obj);

54}

Radeon_bo_unref (&vram_obj);

56}

if (gtt_obj) {

if (radeon_bo_is_reserved (gtt_obj)) {

Radeon_bo_unpin (Gtt_obj);

Radeon_bo_unreserve (Gtt_obj);

61}

Radeon_bo_unref (&gtt_obj);

63}

The code above shows the process of creating two buffer object (BO), allocating memory space separately from VRAM and GTT memory, and eventually freeing up memory space and Bo. The Buffer object is the basic structure of the graphics card to the memory management, is the abstraction of a piece of RAM, the Radeon graphics driver uses the RADEON_BO structure to manage and describe a piece of video memory.

1-2 lines, here we have two Bo objects (allocated two video memory), one piece of memory from VRAM, another piece from GTT memory.

8 lines, create and initialize a Bo, allocate video memory. The parameters are as follows:

Rdev,radeon_device structure Pointer;
Size of the BO;
True, the request from the kernel or user space, if it is the kernel, the allocation Bo structure process is non-interruptible, and from the user space and kernel space access to this video memory when the virtual address and the physical address between the mapping relationship is different;
Radeon_gem_domain_vram, the video memory is located in VRAM or GTT, the RADEON driver defines 3 types of video memory Radeon_gem_domain_cpu (0x1), Radeon_gem_domain_ GTT (0x2), define Radeon_gem_domain_vram (0x4), RADEON_GEM_DOMAIN_CPU is not clear for what purpose, the following two means memory from GTT memory and VRAM;
Vram_obj,bo pointer, returns the BO structure.

14 Line, reserve (reserved) Bo, (indicates that the current Bo is already in use, does not allow other code to use?? ）。 If Bo is already in reserve, it will not be available until Bo is unreserve.

17 lines, get the GPU virtual address of the memory represented by Bo, the GPU will use this address to access the memory, and then we let the GPU to use this type of address.

23 Line, the mapping Bo represents the memory space, the second parameter of the function returns the mapped CPU virtual address, the driver will use this to access the memory.

29-47 lines of code and the same principle, the difference is that the memory from the GTT memory, the API functions within the processing of the difference will be relatively large, but the use of the API only the memory type of the parameter is different.

50-56 lines free memory and Bo structure.

Use of memory outside the core

User space acquires video memory through LIBDRM. The following code shows how to obtain and use video memory outside of the kernel:

1 int ret;

2 struct Kms_bo *bo;

3 unsigned bo_attribs[] = {

4 Kms_width, 0,

5 kms_height, 0,

6 Kms_bo_type, Kms_bo_type_scanout_x8r8g8b8,

7 Kms_terminate_prop_list

8};

9 bo_attribs[1] = width;

Ten bo_attribs[3] = height;

One ret = kms_bo_create (KMS, bo_attribs, &BO);

if (ret) {

fprintf (stderr, "failed to Alloc buffer:%s\n", Strerror (-ret));

return NULL;

15}

+ ret = Kms_bo_get_prop (bo, Kms_pitch, Stride);

if (ret) {

fprintf (stderr, "failed to retreive buffer stride:%s\n", Strerror (-ret));

Kms_bo_destroy (&BO);

return NULL;

21}

ret = Kms_bo_map (bo, &virtual);

if (ret) {

fprintf (stderr, "failed to map buffer:%s\n", Strerror (-ret));

Kms_bo_destroy (&BO);

+ Return NULL;

27}

return Bo;

This code is similar to the code in the kernel, and the reader should be able to understand its meaning based on the function name of the called function. To write a complete program, you can refer to the LIBDRM source code or the sample included here.

Graphics system in "original" Linux environment and AMD R600 graphics Programming (4)--AMD graphics memory management mechanism

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Graphics system in "original" Linux environment and AMD R600 graphics Programming (4)--AMD graphics memory management mechanism

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Graphics system in "original" Linux environment and AMD R600 graphics Programming (4)--AMD graphics memory management mechanism

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support