Multi-GPU development of OPENCL (by the way OpenGL multi-GPU development)

Source: Internet
Author: User
Multi-GPU development of OPENCL (by the way OpenGL multi-GPU development)

Label (Space delimited): accelerates OpenCL

Reprint description Source: http://blog.csdn.net/hust_sheng/article/details/75912004 Demand

GPU is used in some accelerated optimization projects, and sometimes we use multiple GPUs in order to pursue speed. In terms of OpenCL, how to fully utilize the computational power of multiple GPUs is critical. Multithreading (example of two GPUs)

Incoming image data
Errnum = Clenqueuewriteimage (Commandqueue_1stgpu, imgin_mt1[0], Cl_true, origin, region, 0, 0, tlimage, 0, N ULL, NULL);

Errnum = Clenqueuendrangekernel (Commandqueue_1stgpu, Step4warpkernel_1stgpu, 1, NULL, Step4globalworksize, step4localworksize, 0, NULL, NULL);

Attention
1. Avoid shared write buffer operations between threads
2. The shared read buffer operation must request a separate resource for each device
3. The device command queue, the executable kernel should also be independent

Only in this way can achieve full parallelism, in general, the multithreading here does not involve the internal operation of the GPU, but the sub-thread will go to call the GPU, the whole is fully compliant with the C language level of multithreading, after testing, the effect is obvious.

Add
Sometimes gpu->cpu need to transfer a large amount of data, this time to use asynchronous operation,

void Tmpfuc ()
{
    ;
}
Static cl_event Event_async;

Cl_false indicates that Clenqueuereadimage is a non-blocking
//event Event_async and clenqueuereadimage command binding, the command execution ends, and the Event_async status changes to Cl_ Complete
errnum = Clenqueuereadimage (Commandqueue_2ndgpu, imgout_mt2[1], Cl_false, origin, Region_imgout, 
                            0, 0, * (outbuffer + 1), 0, NULL, &event_async);
if (errnum! = cl_success) {
    printf ("Clenqueuereadimage error.\n");
}

Event Event_async status changes to Cl_complete Trigger callback function
Clseteventcallback (Event_async, Cl_complete, &tmpfuc, NULL);        The last one represents the function entry

It is important to note that the callback function itself is a child thread. In the project development process, the asynchronous operation has not been successful feeling, so this step is also used for a short time, using the signal volume synchronization.

Whether the OPEMCL itself can implement multi-GPU calls. You can see the following example, the logic is very simple, but from the code is unscientific, the code logic and a single GPU no difference, the above shared buffer problem how to deal with. Leave a question ... Follow the code below to discover that there is no speed boost ... It's embarrassing.

Nvidia Official Demo

OpenGL uses multi-GPU

Many times, the acceleration of OPENCL implementation is more convenient with OpenGL, such as mapping rendering. So how do you use multi-GPU for OpenGL?

A little strange:

Normally, if it can be implemented, perhaps we can use the interface provided by OPENCL, but the survey found that the use of the GPU is the third party to provide the interface, that is, Nvidia to provide.

Wgl_nv_gpu_affibity provided by Nvidia

The use of the library requires Glew (wglew.h file):

#include "    gl/glew.h"  
#include    "Gl/wglew.h"  

Unfortunately, the current N card only Quadro support this library, GeForce and other game graphics card is not supported, directly collapsed, I was drunk ...

We can use the code to detect if the computer graphics card supports wgl_nv_gpu_affibity, see GitHub code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.