Let's take a look at some GPU memory knowledge, the main reference: Http://fgiesen.wordpress.com/0211/07/02/a-trip-through-the-graphics-pipeline-2011-part-2
Now commonly used on the GPU video memory is GDDR5, and the host on the common memory DDR3 compared to, it has high bandwidth, time extension features. DRAM chips are typically organized into 2-D grids, each of which consists of a transistor and a capacitor, each of which represents a memory address bit. such as 1G GDDR5 memory, it is organized 32 GDDR5 blocks:
Each block of 32M consists of 4 bank groupings, each of which consists of 16 (or a) bank.
Each bank consists of a 2-D grid of DRAM chips:
Line address is A0-A11, total 4k, column address A0-a5, total 64, so a bank space is 256k
Usually DRAM is read and written according to the line, so we want to get good read and write efficiency, it is best to read a row of data. The pagesize of GDDR5 is usually 2k.
Note: On the face of DRAM understanding some errors, is learning the knowledge of DRAM, please refer to another log, old Wolf: 2012-11-13:http://www.cnblogs.com/mikewolf2002/archive/2012/11/13/2768804.html "
Let's take a look at how memory works with the GPU and host to understand the flow of video memory:
Some of the fast clients in the GPU, such as depth block,color block,texture block, are directly connected to the MC, and some data volumes are not very large blocks, such as command Processor (CP) going through the hub, And then to the corresponding MC (Memory Controller).
In the hub, there may be VM L2, some page table lookup, then the request is routed to the corresponding MC,MC mainly include Client interface, VM L1, ARB and other modules. Client Infterface will interact with different clients and then pass them to the VM L1, page table lookup, and finally the Arb quorum, into the corresponding gddr. The MC of the GPU is usually 32bit, and the DDR3 MC is usually 64 bits, we can calculate the memory bandwidth of the GPU by the following formula: MCLK * datarate* Channelwidth*channel number/8/1000, Simplification is: Mclk*4*32*channel number/8/1000, assuming that the video card has 12 MC Channel, the Memory bandwidth is: 1375*4*12*32/8/1000=264gb/s
Some other PCIe devices and hosts, all through the PCIe bus, and then into the MMU (memory management Unit), and then into the hub, where the MMU is a general term, in different implementations, perhaps the MMU includes many blocks.
The GPU and the host and other devices interact with the PCIe bus, the GPU and the host usually use PCIE2.0 Lane (the latest graphics card using PCIE3.0), upstream and downstream reached the 8gb/s, other slow devices, For example, display may only need 4lane to be enough.
For more information about PCIe, see: http://www.cnblogs.com/mikewolf2002/archive/2012/03/20/2408389.html
Directx11 Tutorial (+) D3d11 Pipeline (2)