Topic Center

Contact Sales

Home > Others

GPU Memory (global memory) issues when using data alignment

Last Update:2015-06-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Global memory, the normal memory, any thread in the entire grid can read and write to any location in the global memory.

Access delay of 400-600 clock cycles is very easy to become a performance bottleneck.

When accessing the video memory, the reading and storage must be aligned with a width of 4Byte. Without proper alignment, read-write will be split into multiple operations by the compiler, reducing the fetch performance.

Read and write operations for multiple warp if the merge access is met, multiple-fetch operations are merged into one complete. Combined access conditions, 1.0 and 1.1 of device requirements are stricter, and the conditions for merging access are relaxed on devices with 1.2 and higher capabilities.

1.2 and higher capabilities of the device support for the 8 bit, three bit, the three bit, the data word of the combined access, the corresponding segment size is: 32Byte 64Byte 128Byte, greater than 128Byte, two transmission.

In data that is transferred in a single merge, the thread number and the data word number that is accessed are not required.

When accessing 128Byte data, if the address is not aligned to 128Byte, the GT200 will generate two merged visits. Based on the size of each region, it is divided into two combined visits, 32Byte and 96Byte.

When using the global memory, there are two main issues to note:

1. Data alignment issues. One-dimensional data uses cudamalloc () to open up the GPU global memory space, and multidimensional data suggests using cudamallocpitch () to establish memory space to ensure segment alignment. The Cudamallocpitch function allocates memory in which the start address of the first element of each row of the array is guaranteed to be aligned. Because the number of data per row is indeterminate widthofx*sizeof (element) is not necessarily a multiple of 256. Therefore, to ensure that the start address of the first element of each row of the array is aligned, Cudamallocpitch allocates some more bytes per line when allocating memory to ensure that the widthofx*sizeof (element) + multi-Allocated byte is a multiple of 256 (aligned). In this way, the address of the y*widthofx*sizeof (element) +x*sizeof (Element) to calculate a[y][x] is incorrect. Instead, it should be y*[widthofx*sizeof (element) + multi-Allocated byte]+x*sizeof (element). The pitch value returned in the function is the widthofx*sizeof (element) + multi-allocated byte.

2. Consolidated Access. The key is to understand, When the GPU is Half-warp (1.2 and higher devices for warp), that is, 16 threads to access the memory together, to the 16 threads of access to the address in the same area (refers to the hardware can be transferred together width), and no conflict arises, the data of this area can be thread at the same time, improve the efficiency of the visit.

GPU Memory (global memory) issues when using data alignment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

sql server using much memory in memory data grid comparison php cache data in memory apache in memory data grid global dns issues how to find memory leaks in java using eclipse destroy memory

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

GPU Memory (global memory) issues when using data alignment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support