Today, I want to call the Cuda library function to multiply the matrix, but I found that in cublassgemm, the matrix is stored according to the column principal element, that is, column-based storage. This and C in general according to Row Storage completely opposite, followed by a post http://cudazone.nvidia.cn/forum/forum.php? MoD = viewthread & tid = 6001 & extra = & page = 2 describes a method, which is explained later. The specific analysis is as follows:
For example, we want to calculate the matrix operation c = a * B, where a = {}, {}, {3}; B = {1 }, {1 }}; c = {4}, {5}, {6}. For A, B, and C, a one-dimensional array indicates a =, 3, 3}, B = {}, c = {, 6}; this is the same as the previous representation in C/C ++, but it is totally incorrect in cublassgemm, in this case, the one-dimensional a actually represents {1, 2, 3}, {1, 2}. We can see that the two matrices are actually transpose. Then we need c = a * B. If one-dimensional data is input, result a indicates at, and B Indicates Bt. Therefore, we need to input at and BT, in this way, a * B is obtained in the formula. suppose this is the matrix C = a * B, but C is also stored by column. What we want is CT, while Ct = BT *, here, BT is actually the original matrix B, and at is actually the original matrix. It can be seen that C stored by row can be obtained through switching the order of AB.
Another point here is the parameter of cublassgemm, which is a bit dizzy. Here we enter B * a (2*1) * (3*2), but what is actually executed in the function is BT * at = CT (1*2) * (2*3 = (1*3 )). therefore, do not confuse the parameters ..
The Code is as follows:
# Include <cublas_v2.h> // Cuda built-in library function # include