Gpu memory (global memory) Data Alignment issues during use, gpu alignment

Source: Internet
Author: User

Gpu memory (global memory) Data Alignment issues during use, gpu alignment

Global memory, that is, normal video memory. Any thread in the entire grid can read and write any location of the global memory.

When the access latency is 400-600, clock cycles can easily become a performance bottleneck.

Read and storeMust be alignedThe width is 4 bytes. If there is no correct alignment, the read/write will be split into multiple operations by the compiler, reducing the memory access performance.

If the read and write operations of multiple warp tables meet the requirements of the combined access, the multiple access operations are merged into one operation. Merging access conditions: 1.0 and 1.1 of devices have strict requirements, and 1.2 and higher capability devices have relaxed the merging access conditions.

1.2 devices with higher capabilities support combined access to 8-bit, 16-bit, 32-bit, and 64-bit data words. The corresponding segment size is 32 Byte 64 Byte 128 Byte, if the value is greater than 128 bytes, it is transmitted twice.

In a merged and transmitted data, the thread number is not required to be the same as the word number of the accessed data.

When accessing bytes of data, if the address is not aligned to bytes, two merge accesses are generated in GT200. Based on the size of each region, it can be divided into two merge accesses, 32 bytes and 96 bytes.

When using the Global storage, you must pay attention to the following two problems:

1. Data Alignment issues. One-dimensional data uses cudaMalloc () to open up gpu global memory space. We recommend that you use cudaMallocPitch () to create memory space for multi-dimensional data to ensure segment alignment. In the memory allocated by the cudaMallocPitch function, the starting address of the first element in each row of the array is always aligned. Because the number of data entries in each row is an uncertain widthofx * sizeof (element), it is not necessarily a multiple of 256. Therefore, to ensure the starting address alignment of the first element in each row of the array, cudaMallocPitch allocates more bytes for each row of memory to ensure that the widthofx * sizeof (element) + multiple allocated bytes are multiples of 256 (Aligned ). In this way, y * widthofx * sizeof (element) + x * sizeof (element) is incorrect to calculate the address of a [y] [x. It should be y * [widthofx * sizeof (element) + multiple allocated bytes] + x * sizeof (element ). The pitch value returned by the function is widthofx * sizeof (element) + multiple allocated bytes.

2. Merge access. The key is to understand that when the GPU accesses the memory with half-warp (1.2 and higher devices as warp), that is, 16 threads access the memory together, when the addresses accessed by these 16 threads are in the same region (the width can be transmitted together on the hardware) and no conflict occurs, the data in this region can be simultaneously transmitted by the thread, improves the efficiency of memory access.


GPU-Z display of the graphics card information in a memory, a memory is

It's a GB. This is correct,
The operating frequency of the video memory is displayed in the box. The memory and memory display all mean the operating frequency of the video memory at MHz.

If memory is used for memory display, the memory frequency will be half of the working frequency of your memory. For example, if your memory is 1333 and it also works on 1333, if tc is shared, memory frequency will be 666Mhz when the video memory, instead of 500 Mhz, but will improve the performance of your graphics card, GPU-Z will not show TC memory when the video memory frequency.
Similarly, if your memory is ddr2 800 and tc memory is used for display, the display is 400 Mhz, which is lower than the display frequency and will reduce the performance of your video card. Similarly, GPU-Z does not show TC memory as memory frequency.
Do you understand? I don't know how your memory is working,
--------------
What you want to talk about is that the performance of your video card is extremely low. It is almost the same as that of GB and MB. It makes sense to use more than 2 GB of high-end video card with more than yuan.

I am really wronged to spend your 199 yuan. He could not find the North if he bought a 100 yuan second-hand product. 200 yuan to buy a second-hand 5750 graphics card is a large single-host game can run.

What is memory? Are you sure you want to continue the game if you have insufficient memory? --

Video memory, also called frame cache, is used to store the rendered data processed or to be extracted by the video card chip. Like the memory of a computer, memory is a component used to store the graphic information to be processed. The screen we see on the display is composed of pixels, and each pixel controls its brightness and color with 4 to 32 or even 64-bit data, the data must be stored through the video memory and then distributed to the display chip and CPU. Finally, the computation result is converted to a graphic output to the display. The video memory is the same as the memory on the Main Board. However, it stores information like each pixel that the video card outputs to the display. Video memory is a very important component of the video card. After the display chip processes the data, it will save the data to the video memory, and then RAMDAC (digital-to-analog converter) read the data from the video memory and convert the digital signal into a analog signal, which is finally displayed on the screen. In the advanced graphics accelerator card, the video memory is not only used to store graphic data, but also used by the display chip for 3D function operations. In nVIDIA and other advanced display chips, "GPUs" (graphics processing units) parallel to CPU have been released ). "T & L" (deformation and lighting) and other high-density operations are completed by the GPU on the video card, thus increasing the dependence on the video memory. Obviously, the speed and bandwidth of the video memory directly affect the overall speed of the video card. As the memory of the Main Board, the memory has gone through multiple stages of development. It can be said that the development of the memory is more active than the main board memory, and there are more varieties and types. The most widely used memory types are SDRAM and SGRAM. Since last year, the DDR memory with better performance was first applied to the video card, which improves the overall performance of the video card. Taking the success of the video card as the forerunner, the DDR has been fully developed into the main board system. Now, the era of "three or two years alone" is coming soon.

The video memory capacity is the number of local video memory on the video card, which is one of the key parameters for selecting the video card. The size of the video memory capacity determines the ability of the video memory to temporarily store data, which also affects the performance of the video card to a certain extent. The video memory capacity also increases with the development of the video card, and there is an increasing trend. The memory size ranges from kb, 1 MB, 2 MB to 8 MB, 12 MB, 16 MB, 32 MB, and 64 MB, until now the mainstream 512 MB, 1 GB and 2 GB of high-end graphics cards, some professional graphics cards even have 4 GB of video memory. In terms of the Maximum resolution of a video card, the maximum resolution is directly related to the video memory to a certain extent, because the data of these pixels is originally stored in the video memory, so the video memory capacity will affect the maximum resolution. When the memory size of an early video card is only kb, 1 MB, and 2 MB, the memory size is indeed a bottleneck of the Maximum resolution, even 64 MB has been eliminated. mainstream entertainment-level video cards are already MB or 1 GB. Some professional video cards even have 4 GB of video memory. In this case, the memory size is no longer a factor affecting the maximum resolution. In terms of video card performance, as the processing capability of the display chip becomes more and more powerful, especially the large 3D games and professional Rendering now require more and more temporary storage data, the memory capacity required is also growing, the video memory capacity also affects the performance of the video card to a certain extent. For example, when the display core is strong enough and the memory size is relatively small, a large amount of texture map data needs to be stored. If the memory size is insufficient to store the data, in this case, the display core is only idle for some time to wait for the data to be processed. This affects the display core performance and affects the performance of the video card. It is worth noting that the larger the video memory capacity does not necessarily mean the higher the performance of the video card, because the three factors that determine the performance of the video card are the display chip used first, the second is the memory bandwidth (depending on the memory width and memory frequency), and the last is the memory capacity. The size of the video memory that a video card should be equipped with is determined by the display chip it uses. That is to say, the video memory capacity should match the display core performance, the higher the display chip performance, the higher the processing capability. The higher the Display memory capacity, the higher the performance. However, a low-performance display chip equipped with large-capacity display memory is not helpful for its performance. For example, some commercially available products are equipped with MB capacity... the remaining full text>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.