Multi-GPU development of OPENCL (by the way OpenGL multi-GPU development)
Label (Space delimited): accelerates OpenCL
Reprint description Source: http://blog.csdn.net/hust_sheng/article/details/75912004 Demand
GPU is used in some accelerated optimization projects, and sometimes we use multiple GPUs in order to pursue speed. In terms of OpenCL, how to fully utilize the computational power of multiple GPUs is critical. Multithreading (example of two GPUs)
Incoming image data
Errnum = Clenqueuewriteimage (Commandqueue_1stgpu, imgin_mt1[0], Cl_true, origin, region, 0, 0, tlimage, 0, N ULL, NULL);
Errnum = Clenqueuendrangekernel (Commandqueue_1stgpu, Step4warpkernel_1stgpu, 1, NULL, Step4globalworksize, step4localworksize, 0, NULL, NULL);
Attention
1. Avoid shared write buffer operations between threads
2. The shared read buffer operation must request a separate resource for each device
3. The device command queue, the executable kernel should also be independent
Only in this way can achieve full parallelism, in general, the multithreading here does not involve the internal operation of the GPU, but the sub-thread will go to call the GPU, the whole is fully compliant with the C language level of multithreading, after testing, the effect is obvious.
Add
Sometimes gpu->cpu need to transfer a large amount of data, this time to use asynchronous operation,
void Tmpfuc ()
{
;
}
Static cl_event Event_async;
Cl_false indicates that Clenqueuereadimage is a non-blocking
//event Event_async and clenqueuereadimage command binding, the command execution ends, and the Event_async status changes to Cl_ Complete
errnum = Clenqueuereadimage (Commandqueue_2ndgpu, imgout_mt2[1], Cl_false, origin, Region_imgout,
0, 0, * (outbuffer + 1), 0, NULL, &event_async);
if (errnum! = cl_success) {
printf ("Clenqueuereadimage error.\n");
}
Event Event_async status changes to Cl_complete Trigger callback function
Clseteventcallback (Event_async, Cl_complete, &tmpfuc, NULL); The last one represents the function entry
It is important to note that the callback function itself is a child thread. In the project development process, the asynchronous operation has not been successful feeling, so this step is also used for a short time, using the signal volume synchronization.
Whether the OPEMCL itself can implement multi-GPU calls. You can see the following example, the logic is very simple, but from the code is unscientific, the code logic and a single GPU no difference, the above shared buffer problem how to deal with. Leave a question ... Follow the code below to discover that there is no speed boost ... It's embarrassing.
Nvidia Official Demo
OpenGL uses multi-GPU
Many times, the acceleration of OPENCL implementation is more convenient with OpenGL, such as mapping rendering. So how do you use multi-GPU for OpenGL?
A little strange:
Normally, if it can be implemented, perhaps we can use the interface provided by OPENCL, but the survey found that the use of the GPU is the third party to provide the interface, that is, Nvidia to provide.
Wgl_nv_gpu_affibity provided by Nvidia
The use of the library requires Glew (wglew.h file):
#include " gl/glew.h"
#include "Gl/wglew.h"
Unfortunately, the current N card only Quadro support this library, GeForce and other game graphics card is not supported, directly collapsed, I was drunk ...
We can use the code to detect if the computer graphics card supports wgl_nv_gpu_affibity, see GitHub code