Next we will learn some GPU memory knowledge, the main reference: http://fgiesen.wordpress.com/0211/07/02/a-trip-through-the-graphics-pipeline-2011-part-2
Currently, the commonly used video memory on GPUs is gddr5. Compared with the memory ddr3 commonly used on the host, it features high bandwidth and time extension. The following is a comparison between memory and memory in core i7 2600 and gtx480.
... |
Core i7 2600 |
GTX 480 |
Bandwidth |
19 Gb/s |
180 Gb/s |
Latency |
140 clocks |
400-800 clocks |
DRAM chips are usually organized into a two-dimensional grid. Each intersection is composed of a transistor and a capacitor, and each intersection represents an address bit of memory. For example, 1g gddr5 memory is organized into 32 gddr5 blocks:
Each 32 MB block consists of four bank groups. Each bank group consists of 16 (or 32) Banks.
Each bank is composed of a two-dimensional grid dram chip:
The row address is a A0-A11, a total of 4 K, the column address A0-A5, a total of 64, so a bank space is 256 K
Generally, DRAM reads and writes data by row. To improve the Read and Write efficiency, it is best to read a row of data at a time. The pagesize of gddr5 is usually 2 K.
[Note: There are some mistakes in DRAM understanding. I am learning about dram. For details, refer to another log. old wolf: 2012-11-13: http://www.cnblogs.com/mikewolf2002/archive/2012/11/13/2768804.html]
Next, let's take a look at how memory connects to GPU and host to understand the workflow of video memory:
Some quick clients in the GPU, such as depth block, color block, and texture block, are directly connected to MC, while some blocks with a small amount of data, such as command processor (CP) it must go through the hub and then reach the corresponding MC (memory controller ).
In the hub, there may be VM L2, which will perform some page table searches and then the requests will be routed to the corresponding MC. Mc mainly includes the client interface, Vm L1, ARB and other modules. Client infterface will deal with different clients, then pass them to VM L1, perform page table search, and finally go to ARB for arbitration to enter the corresponding GDDR. Gpu mc is usually 32bit, while ddr3 MC is usually 64-bit. We can use the following formula to calculate the GPU memory bandwidth: mclk * datarate * channelwidth * channel number/8/1000, simplified: mclk * 4*32 * channel number/8/1000. If the video card has 12 Mc channels, the memory bandwidth is 1375*4*12*32/8/1000 = 264 Gb/s.
Other PCIe devices and hosts are connected to the MMU (Memory Management Unit) through the PCIe bus and then to the hub. Here, MMU is a general term for different implementations, MMU may include many blocks.
GPU interacts with the host and other devices through the PCIe bus. The GPU and the host usually use pcie2.0 16 Lane (the latest video card uses pcie3.0), and the uplink and downlink reach 8 Gb/s, other slow devices, such as display, may only require 4 lanes.
For details about PCIe, see: http://www.cnblogs.com/mikewolf2002/archive/2012/03/20/2408389.html