About cuda Register arrays
in order to improve the speed of the algorithm in the parallel optimization of some algorithms based on Cuda, sometimes we would like to use Register array to make the algorithm fly generally fast, however, the effect is always passable. Used to be faster than useless, this is why?
Haha, to say the point, we define the array of registers in the following two ways:
1 Inta[8];
At this point, is the array we are defining really the register array we want? Such a definition, the compiler put our definition of "register array" in the local memory, and the local memory is the memory of a space opened up, how fast can speed?
2 int A[8] = {0,0,0,0,0,0,0,0};
by the way, when the definition is initialized, is this the register array we want? Not necessarily, only to say that there is a certain probability that we want the register array. The compiler determines whether the register array is placed in the local memory based on the size of the defined array. But the size of the specific array is what we want to register an array, unknown!
So, is there a way to force an array of registers that we define to be stored in registers? I said, not found.
It seems that only multiple variables are defined.
Example: an int a[8]; Replace with the following form:
Int A0;
INT A1;
INT A2;
INT A3;
Int A4;
Int A5;
Int A6;
Int A7;
Just such a definition method, the universality of the program will drop another level
Said for a long, seems to be a nonsense, so of course. Well, again, if you guys know how to define the array of registers we want in Cuda (not the register array that opens the storage space in the local memory), please contact me and kneel down.
Cuda Register Array Usage parsing