Recently, some netizens in the group asked the Cuda 2D gmem copy question. Yesterday, the Forum also asked the same question: copy a sub slice of source gmem to another gmem, the following describes in detail how to implement a kernel that is no longer needed:
Test (copy a sub-area with a size of 50x50 to the target gmem starting from the gmem area of 100x100 and the starting index is (25, 25): <br/> SRC gmem pointer: dpsrc <br/> SRC gmem layout: 100x100 <br/> DST gmem pointer: dpdst <br/> DST gmem layout: 50*50 <br/> initialize SRC gmem in the row order: 0 ~ Value of 9999 <br/> cuda_memcpy2d planemem; <br/> memset (& planemem, 0, sizeof (planemem); <br/> planemem. srcmemorytype = cu_memorytype_device; <br/> planemem. srcdevice = dpsrc; <br/> planemem. srcxinbytes = 25 * sizeof (float); <br/> planemem. srcy = 25; <br/> planemem. srcpitch = 100 * sizeof (float); <br/> planemem. dstmemorytype = cu_memorytype_device; <br/> planemem. dstdevice = dpdst; <br/> planemem. dstxinbytes = 0; <br/> planemem. dsty = 0; <br/> planemem. dstpitch = 50 * sizeof (float); <br/> planemem. widthinbytes = planemem. dstpitch; <br/> planemem. height = 50; <br/> cumemcpy2dunaligned (& planemem); // if the data is aligned, it is best to use cumemcpy2d; otherwise, you must use this function, in addition, when the memory is allocated using cumemallocpitch, if the memory layout skill is not the power of 2, you need to set srcpitch and dstpitch of planemem to the pitch parameter obtained through cumemallocpitch, instead of the memory layout size * sizeof (type) <br/>
Note:CodeTested