CUDA Linear memory allocation

Last Update:2015-01-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original link

Overview: Linear memory can be allocated via Cudamalloc (), Cudamallocpitch (), and Cudamalloc3d ()

1.1D Linear memory allocation

1 Cudamalloc (Void**,int)    //Allocate memory on device side 2 cudamemcpy (void* dest,void* source,int size,enum direction)    //Data copy 3 Cudamemcpytosymbol       //copy data into __constant__ variable, or __device__ variable 4 cudamemcpyfromsynbol   //Ibid. 5 cudafree ()               Memory Free 6 Cudamemset ()           //Memory initialization

Note: The data exchange between the host and the device is automatically synchronized, but the device and device do not, need to use Cudathreadsynchronize ()

2.2D Linear memory allocation

2.1 Allocation

1 Cudamallocpitch (void** devptr,size_t* pitch,size_t widthinbytes,size_t height)//Allocate a two-dimensional array in linear memory, the unit of width is byte, And the height unit is the data type

C Language application for 2-dimensional memory, is generally stored continuously. A[Y][X] is stored in the y*widthofx*sizeof (element) +x*sizeof (element) bytes.

But in Cuda's global memory access, from 256-byte aligned addresses (ADDR=0, 256, 512, ...) The start of continuous access is the most efficient . Thus, in order to improve the efficiency of memory access, there is the Cudamallocpitch function. The Cudamallocpitch function allocates memory in which the start address of the first element of each row of the array is guaranteed to be aligned. Because the number of data per row is indeterminate, widthofx*sizeof (element) is not necessarily a multiple of 256. Therefore, to ensure that the start address of the first element of each row of the array is aligned,Cudamallocpitch allocates some more bytes per line when allocating memory to ensure that the widthofx*sizeof (element) + multi-Allocated byte is a multiple of 256 (aligned). Thus, the address of the above y*widthofx*sizeof (element) +x*sizeof (Element) to calculate a[y][x] is incorrect. Instead, it should be y*[widthofx*sizeof (element) + multi-Allocated byte]+x*sizeof (element). The pitch value returned in the function is the widthofx*sizeof (element) + multi-allocated byte. Description:widthinbytes as the input parameter, should be widthofx*sizeof (element), so that the content should be copied to make corresponding changes.

2.2 Visits

1 t* pelement = (t*) ((char*) baseaddress + Row * pitch) + Column;           How elements are accessed

Cudamallocpitch () returns the spacing in the form of *pitch, which is the width of the allocated memory, in bytes. The spacing is used as a separate parameter for memory allocation to compute addresses within a 2D array.

2.3 Copy

1 cudamemcpy2d (void* dst,size_t dpitch,const void* src,size_t spitch,size_t width,size_t height,enum cudaMemcpyKind Kind )

There is a need to pay special attention to the difference between width and pitch, which is the width of the data that actually needs to be copied and pitch is the 2D linear storage allocation to Ching, and the host side pitch==width when data transfer occurs between the device and the host.

As we can see, Cuda's access to the two-dimensional linear space is not provided with multiple subscript support, while access is still calculated by the offset, the difference lies in the use of pitch alignment is very conducive to achieve coalesce access

Example: The following code assigns a two-dimensional floating-point array of size width*height, and demonstrates how to iterate over the elements of the arrays in device code

1//Host code 2   int width =, height = 3   float* devptr; 4   int pitch; 5   Cudamallocpitch ((void**) & ;d evptr, &pitch, Width * sizeof (float), height);  6   mykernel<<<100, 512>>> (devptr, pitch, width, height); 7//Device code 8   __global__ void Myke Rnel (float* devptr, int pitch, int width, int height) {9 for    (int r = 0; r < height; ++r) {ten       float* row = (Flo at*) ((char*) Devptr + R * pitch); one for       (int c = 0; c < width; ++c) {          float element = row[c];13       }14
   
    }15}

3, 3D linear memory

1 cudaerror_t cudamalloc3d (    2     struct cudapitchedptr *     pitcheddevptr,3     struct cudaextent             Extent     4)

Example: The following code assigns a three-dimensional floating-point array of size width*height*depth, and demonstrates how to iterate over the elements of the arrays in device code

1//Host code 2 cudapitchedptr devpitchedptr; 3 cudaextent extent = Make_cudaextent (64, 64, 64); 4 Cudamalloc3d (&devpitchedptr, extent);   5 mykernel<<<100, 512>>> (devpitchedptr, extent); 6//Device code 7 __global__ void Mykernel (cudapitchedptr devpitchedptr, cudaextent extent) {8    char* devptr = Devpit Chedptr.ptr; 9    size_t pitch = devpitchedptr.pitch;10    size_t slicepitch = pitch * extent.height;11 for    (int z = 0; Z < EX tent.depth; ++z) {      char* slice = devptr + Z * slicepitch;13 for      (int y = 0; y < extent.height; ++y) {         float* row = (float*) (Slice + y * pitch);         (int x = 0; x < extent.width; ++x) {float element = row[x];16      }17    }18}

Category: Cuda Learning Notes

CUDA Linear memory allocation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

CUDA Linear memory allocation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

CUDA Linear memory allocation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support