Array in Cuda

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I just read something about Cuda and planned to write a program. As a result, I encountered a bunch of problems. The first problem is the array transfer problem on the host and device, which is a bit dizzy. After reading some information, I will summarize it as follows.

1: How did the problem come about?

One-dimensional array, two-dimensional array, and three-dimensional array are used on device. For one-dimensional arrays, cudamalloc and cudamemcpy are used for memory allocation and assignment, however, for 2D and 3D distributions, we wanted to convert them into one dimension. However, it is inconvenient to assign values. I just read an example to do this:

arr[n][n];cudaMalloc((void**) &dst, sizeof(float) * n * n); cudaMemcpy2D(dst, sizeof(float) * n, arr, sizeof(float) * n,sizeof(float) * n, n, cudaMemcpyHostToDevice);

Here we first define a two-dimensional array arr [N] [N] on C. The two-dimensional array is continuous, which is equivalent to applying for a continuous space on the CPU. You need to apply for space on the GPU and copy the CPU value to the GPU. Now, the GPU uses cudamalloc to apply for a space. Note that the space applied here is continuous. In this way, you can copy the value from the CPU to the GPU. Here, the value is assigned through cudamemcpy2d. The most important thing in this function is the parameter pitch, which is equivalent to the complement concept in the image. But it is still a bit different. It is necessary to complete the image, but the data allocated here is determined by yourself. You can complete the data or not.

In the above example, it is obviously not completed. However, because the space allocated on the GPU is continuous, continuous space is used directly without completing the configuration. But what are the disadvantages? In the GPU, the access speed starts with an integer multiple of 256 is relatively fast, so it is generally an integer multiple of 256. If not completed, the access may be slower.

2: assigning values to and accessing two-dimensional arrays

In fact, the above concepts are not clear first, and the general statement on the internet is as follows:

cutilSafeCall(cudaMallocPitch((void**)&GPU_InputData->m_prData1,&pitch,width*sizeof(float),height));cutilSafeCall(cudaMemcpy2D(GPU_InputData->m_prData1,pitch,CPU_InputData->m_prData1,width,width,height,cudaMemcpyHostToDevice));

The first sentence is space allocation, and the second sentence is the copy parameter. The following describes the specific process.

The first sentence is to allocate a two-dimensional array. The length and width of the array are width * sizeof (float) and height. In this example, we complete the settings for the fast range. The size of the line after completion is pitch, which is similar to the concept in the image.

The second sentence is the copy parameter. Copy the CPU data to the GPU. Because there is a completion behavior in the GPU, the space in the GPU may be incomplete, that is, the space in the GPU is not consecutive. Therefore, pay attention to data access.

3: assign values to and access 3D Arrays

Cudaextent extent = make_cudaextent (COL, row, 8); // cudamalloc3d (gpu_inputdata-> initdeformip, extent) of the array size on Cuda; // allocate space cudamemcpy3dparms hosttodev = {0 }; // The parameter for mutual replication between the host and the device, first assigned to 0 // The parameter above the host, encapsulate the common pointer into the cudapitchedptr type pointer hosttodev. srcptr = make_cudapitchedptr (void *) cpu_inputdata.initdeformip, Col * sizeof (float), Col, row); hosttodev. dstptr = gpu_inputdata-> initdeformip; hosttodev. extent = extent; hosttodev. kind = cudamemcpyhosttodevice; cutilsafecall (cudamemcpy3d (& hosttodev); // cudamemcpy3d transmits 3D data on the host and device. The specific data and direction are determined by the cudamemcpy3dparas parameter.

Here, the two-dimensional parameters are actually the same, but the three-dimensional parameters are a little more complex. Therefore, some auxiliary parameters are introduced, such as the size extent and the cudamemcpy3dparms of the three-dimensional replication parameters. I will not elaborate on the details.

4: Cuda Array

When searching online, you can see that the Cuda array is also used, that is, cudaarray. After reading some materials, we can see that cudaarray is designed for the texture memory in Cuda. It is used to associate data with texture storage to accelerate access. Both the allocation and access processes are similar to the previous ones. Some specific parameter settings are slightly different. However, the idea of memory allocation, assignment, and access is the same.

5. Comparison of common Arrays

The commonly used array form is shown in three types: P [I] [J] [K], cudapitchedptr, and cudaarray. In this mode, P [I] [J] [k] is usually used by the CPU. It is continuously allocated and can also be used in cuda, but because it is not aligned, therefore, the access efficiency may be relatively low. Cudapitchedptr is commonly used in Cuda. It is aligned. Cudaarray is introduced in Cuda to utilize the texture memory.

Now, we have some problems. We need to speed up the process later .. Come on ..

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Array in Cuda

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Array in Cuda

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support