Shared Memory for large-scale point Product

Last Update:2014-09-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Package and download a project

1/* 2 * copyright 1993-2010 NVIDIA Corporation. all rights reserved. 3*4 * NVIDIA Corporation and its Licensors retain all intellectual property and 5 * proprietary rights in and to this software and related documentation. 6 * any use, reproduction, disclosure, or distribution of this software 7 * and related documentation without an express license agreement from 8 * NVIDIA Corporation is stric Tly prohibited. 9*10 * Please refer to the applicable NVIDIA End User License Agreement (EULA) 11 * associated with this source code for terms and conditions that govern 12 * your use of this NVIDIA software. 13*14 */15 16 17 # include ".. /common/book. H "18 # include" Cuda. H "19 # include" cuda_runtime.h "20 # include" device_launch_parameters.h "21 # include" device_functions.h "22 # define Imin (, B) (a <B? A: B) 23 24 const int n = 33*1024; 25 const int threadsperblock = 256; // each thread block starts 256 threads 26 const int blockspergrid = Imin (32, (N + threadsperblock-1)/threadsperblock); 27 28/* 29 kernel function 30 */31 _ global _ void dot (float * a, float * B, float * c) {32 // shared memory on the device. Each thread block contains 33 _ shared _ float cache [threadsperblock]; 34 int tid = threadidx. X + blockidx. x * blockdim. x; 35 // The thread index in the thread block is assigned to the Buffer Index 36 int CAC Heindex = threadidx. x; 37 38 float temp = 0; 39 // when the current index is smaller than the total data volume, 40 while (TID <n) {41 temp + = A [TID] * B [TID]; 42 // The step size is the number of active threads 43 TID + = blockdim. x * griddim. x; 44} // if the execution is performed on this thread again, temp stores the value calculated last time, that is, the result of the re-calculation is to add the last calculated value 45 46 // set the cache values 47 // to store the result in the shared storage, each thread corresponds to a shared storage 48 cache [cacheindex] = temp; 49 50/* 51 synchronize threads in this block 52 synchronization operation, so that each thread has completed the calculation, continue with the subsequent operations 53 */54 _ syncthr EADS (); 55 56 57 // for functions, threadsperblock must be a power of 2 58 // because of the following code 59/* 60 reduction operation 61 blockdim. divide the number of threads in the X/2 blocks by 2, which is equivalent to taking the center value 62. Because blockdim is a multiple of 2, there will be no division between 63 */64 int I = blockdim. x/2; 65 while (I! = 0) {66 If (cacheindex <I) 67/* 68 first half and second half correspond to the first addition, similarly, 69 */70 cache [cacheindex] + = cache [cacheindex + I]; 71/* 72 synchronization means that all threads have completed the first reduction. The next reduction is 73 */74 _ syncthreads (); 75 // The center of the next reduction is 76 I/= 2; 77} 78 // the final result is stored in cache [0]. Therefore, assign cache [0] to 79 if (cacheindex = 0) of the block index as the underlying array) 80 C [blockidx. x] = cache [0]; 81} 82 83 84 int main (void) {85 float * a, * B, c, * partial_c; 86 float * dev_a, * dev_ B, * dev_par Tial_c; 87 88 // allocate memory on the CPU side 89 A = (float *) malloc (N * sizeof (float); 90 B = (float *) malloc (N * sizeof (float); 91 partial_c = (float *) malloc (blockspergrid * sizeof (float )); 92 93 // allocate the memory on the GPU 94 handle_error (cudamalloc (void **) & dev_a, 95 N * sizeof (float ))); 96 handle_error (cudamalloc (void **) & dev_ B, 97 N * sizeof (float); 98 handle_error (cudamalloc (void **) & dev_p Artial_c, 99 blockspergrid * sizeof (float); 100 101 // fill in the host memory with data102 for (INT I = 0; I <n; I ++) {103 A [I] = I; 104 B [I] = I * 2; 105} 106 107 // copy the arrays 'A' and 'B' to the gpu108 handle_error (cudamemcpy (dev_a, A, N * sizeof (float), 109 cudamemcpyhosttodevice )); 110 handle_error (cudamemcpy (dev_ B, B, n * sizeof (float), 111 cudamemcpyhosttodevice); 112 113 dot <blockspergrid, T Hreadsperblock >>> (dev_a, dev_ B, dev_partial_c); 114 115 // copy the array 'C' back from the GPU TO THE cpu116 handle_error (cudamemcpy (partial_c, dev_partial_c, 117 blockspergrid * sizeof (float), 118 cudamemcpydevicetohost )); 119 120/* complete the final addition work on the host 121 this is to avoid the waste of resources caused by simple work on the GPU 122 because many resources are idle 123 */124 C = 0; 125 for (INT I = 0; I <blockspergrid; I ++) {126 C + = partial_c [I]; 127} 128 129 # define sum_squar ES (x) (x * (x + 1) * (2 * x + 1)/6) 130 printf ("Does GPU Value %. 6G = %. 6G? \ N ", C, 2 * sum_squares (float) (n-1); 131 132 // free memory on the GPU sidemo-handle_error (cudafree (dev_a )); 134 handle_error (cudafree (dev_ B); 135 handle_error (cudafree (dev_partial_c); 136 137 139 // free memory on the CPU side138 free (a); free (B ); 140 free (partial_c); 141}

Shared Memory for large-scale point Product

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Shared Memory for large-scale point Product

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Shared Memory for large-scale point Product

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support