In opencl or cuda, the use of volatile is often ignored for access to global shared variables, which will not be problematic only once, however, if the shared variable is accessed for the second time, it will be optimized by the compiler to obtain the value when it is referenced for the first time. That is to say, the current thread will not be visible when other threads modify shared variables.
The following is a simple opencl example to describe this situation:
_ KERNEL void solve_sum (<br/> _ global unsigned buffer [512], <br/> _ global unsigned Dest [512] <br/>) <br/>{< br/> _ local volatile int flag = 0; </P> <p> size_t gid = get_global_id (0 ); </P> <p> const uint4 value = (uint4) (1, 2, 3, 4 ); </P> <p> If (0 <= GID & gid <32) <br/> {<br/> while (flag = 0 ); <br/> vstore4 (value, GID, buffer); <br/> // write_mem_fence (clk_global_mem_fence); <br/> flag = 0; <br/>}< br/> else if (32 <= GID & gid <64) <br/>{< br/> flag = 1; <br/> while (flag = 1); <br/> unsigned ret = buffer [127 + 32-gid]; </P> <p> Dest [GID-32] = ret; <br/>}< br/>
In the above Code, if volatile is removed, the warp from thread 32 to thread 63 will be in an endless loop. Because 1 was previously written to the flag, while (flag = 1) is followed; this statement is always true when executed; External modifications to the flag, this warp cannot be seen.