OpenCL programming guide for interoperability with Direct3D and opencl programming guide

Source: Internet
Author: User
Tags pixel coloring

OpenCL programming guide for interoperability with Direct3D and opencl programming guide

This article describes the interoperability between OpenCL and D3D 10.

1. initialize OpenCL context for Direct3D interoperability

OpenCL sharing is enabled by pragma cl_khr_d3d10_sharing:

# Pragma opencl extension cl_khr_d3d10_sharing: enable

When D3D sharing is enabled, many OpenCL functions will be extended, and some parameter types and values that process D3D10 sharing will be accepted.

You can use the D3D interoperability attribute to create the OpenCL context:

· CL _ CONTEXT_D3D10_DEVICE_KHRUse the attribute name in the attribute parameters of clCreateContext and clCreateContextFromtype.

The function can query specific object parameters for D3D interoperability:

· CL _ CONTEXT_D3D10_PREFER_SHARED_RESOURCES_KHRAs the param_name parameter value of clGetContextInfo.

· CL _ MEM_D3D10_RESOURCE_KHRAs the param_name parameter value of clGetMemObjectInfo.

· CL _ IMAGE_D3D10_SUBRESOURCE_KHRAs the param_name parameter value of clGetImageInfo.

· CL _ COMMAND_ACQUIRE_D3D10_OBJECTS_KHRAndCL_COMMAND_RELEASE_D3D10_OBJECTS_KHRWhen param_name is CL_ENCENT_COMMAND_TYPE, it is returned in the parameter param_value of clGetEventInfo.

The OpenCL D3D10 interoperability function is included in the header file cl_d3d10.h. The Khronos extension of D3D10 can be obtained from the Khronos website. For some released versions, you may need to download this extension.

The process of initializing OpenCL is basically the same as that of initialization. There are only a few minor differences. First, the platform can use the clGetPlatformIDs function to list. Because we are searching for a platform that supports D3D sharing, we need to use clGetPlatformInfo () call on each platform to query the extensions it supports. If the extension string contains cl_khr_d3d10_sharing, this platform can be used for D3D sharing.

Given a cl_platform_id that supports D3D sharing, you can use clGetDeviceIDsFromD3D10KHR () on this platform to query the corresponding OpenCL device ID:

cl_int clGetDeviceIDsFromD3D10KHR(    cl_platform_id             platform,    cl_d3d10_device_source_khr d3d_device_source,    void *                     d3d_object,    cl_d3d10_device_set_khr    d3d_device_set,    cl_uint                    num_entries,    cl_device_id *             devices,    cl_uint *                  num_devices)

For example:

errNum = clGetDeviceIDsFromD3D10KHR(    platformIds[index_platform],    CL_D3D10_DEVICE_KHR,    g_pD3DDevice,    CL_PREFERRED_DEVICES_FOR_D3D10_KHR,    1,    &cdDevice,    &num_devices);if (errNum == CL_INVALID_PLATFORM) {    printf("Invalid Platform: Specified platform is not valid\n");} else if( errNum == CL_INVALID_VALUE) {    printf("Invalid Value: d3d_device_source, d3d_device_set is not valid or num_entries = 0 and devices != NULL or num_devices == devices == NULL\n");} else if( errNum == CL_DEVICE_NOT_FOUND) {    printf("No OpenCL devices corresponding to the d3d_object were found\n");}

Obtain an OpenCL device ID (cdDevice) from the selected OpenCL platform (platformIds [index_platform ). The constant CL_D3D10_DEVICE_KHR indicates that the D3D10 object (g_pD3DDevice) sent is a D3D10 device. Use CL_PREFERRED_DEVICES_FOR_D3D10_KHR to select the expected device for the platform. This will return the expected OpenCL device associated with the platform and D3D10 device.

The device ID returned by this function can be used to create a context that supports D3D sharing. When creating an OpenCL context, the cl_context_properties field in the clCreateContext * () call should include the pointer of the D3D10 device to be shared. For example:

cl_context_properties contextProperties[] ={    CL_CONTEXT_D3D10_DEVICE_KHR,     (cl_context_properties)g_pD3DDevice,    CL_CONTEXT_PLATFORM,    (cl_context_properties)platformIds[index_platform],    0};context = clCreateContextFromType( contextProperties, CL_DEVICE_TYPE_GPU, NULL, NULL, &errNum ) ;

In this sample code, the pointer of g_pD3DDevice of D3D10 is returned from the call of D3D10CreateDeviceAndSwapChain.

2. Create an OpenCL memory object from the D3D buffer and Texture

You can use the clCreateFromD3D10 * KHR () OpenCL function to create an OpenCL buffer and image object from an existing D3D buffer object and texture.

You can use clCreateFromD3D10BufferKHR () to create an OpenCL memory object from an existing D3D Buffer:

cl_mem clCreateFromD3D10BufferKHR(    cl_context     context,    cl_mem_flags   flags,    ID3D10Buffer * resource,    cl_int *       errcode_ret)

The size of the returned OpenCL buffer object is the same as that of the resource. This call will increase the internal Direct3D reference count on the resource. 1. When the OpenCL reference count on the returned OpenCL memory object is reduced to 0, the internal Direct3D reference count on the resource will be reduced by 1.

Both the buffer and texture can be shared with OpenCL.

In D3D10, you can create a texture as follows:

// 2D textureD3D10_TEXTURE2D_DESC desc;ZeroMemory( &desc, sizeof(D3D10_TEXTURE2D_DESC) );desc.Width = g_WindowWidth;desc.Height = g_WindowHeight;desc.MipLevels = 1;desc.ArraySize = 1;desc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;desc.SampleDesc.Count = 1;desc.Usage = D3D10_USAGE_DEFAULT;desc.BindFlags = D3D10_BIND_SHADER_RESOURCE;if (FAILED(g_pD3DDevice->CreateTexture2D( &desc, NULL, &g_pTexture2D)))    return E_FAIL;

The shared texture format is DXGI_FORMAT_R8G8B8A8_UNORM. Then you can use

cl_mem clCreateFromD3D10Texture2DKHR(    cl_context        context,    cl_mem_flags      flags,    ID3D10Texture2D * resource,    UINT              subresource,    cl_int *          errcode_ret)

Create an OpenCL image object. The width, height, and depth of the returned OpenCL image object are determined by the width, height, and depth of the subresource resource. The channel type and sequence of the returned OpenCL image objects are determined by the resource format.

This call will increase the internal Direct3D reference count on the resource. 1. When the OpenCL reference count on the returned OpenCL memory object is reduced to 0, the internal Direct3D reference count on the resource is reduced by 1.

Similar to 3D,

cl_mem clCreateFromD3D10Texture3DKHR(    cl_context        context,    cl_mem_flags      flags,    ID3D10Texture3D * resource,    UINT              subresource,    cl_int *          errcode_ret)


Cl_int encode (cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event)

This will obtain the OpenCL memory object created by the D3D10 resource.

cl_int clEnqueueAcquireD3D10ObjectsKHR(    cl_command_queue command_queue,    cl_uint          num_objects,    const cl_mem *   mem_objects,    cl_uint          num_events_in_wait_list,    const cl_event * event_wait_list,    cl_event *       event)

This will obtain the OpenCL memory object created by Direct3D 10 resources. ClEnqueueAcquireD3D10ObjectsKHR () provides synchronization guarantee. All D3D 10 calls made before clEnqueueAcquireD3D10ObjectsKHR () are fully executed before the event can be reported, all subsequent OpenCL tasks in command_queue can be started.

Release function:

cl_int clEnqueueReleaseD3D10ObjectsKHR(    cl_command_queue command_queue,    cl_uint          num_objects,    const cl_mem *   mem_objects,    cl_uint          num_events_in_wait_list,    const cl_event * event_wait_list,    cl_event *       event)

This will obtain the OpenCL memory object created by Direct3D 10 resources. ClEnqueueReleaseD3D10ObjectsKHR () provides synchronization guarantee. After clEnqueueReleaseD3D10ObjectsKHR () is called, all D3D 10 calls will not be executed immediately until all events in event_wait_list have been completed, in addition, after all the work submitted to command_queue has been completed and executed, the D3D 10 call will start.

In addition, unlike D3D10, OpenGL functions do not support synchronization. In addition, the most efficient way to obtain and release textures is to obtain and release all the shared textures and resources at the same time. In addition, it is best to process all opencl kernels before switching back to D3D. In this way, the get and release call can be used to form the boundary of opencl and D3D processing.

4. Processing D3D textures in OpenCL

Modify the texture of opencl:

cl_int computeTexture(){    cl_int errNum;    static cl_int seq =0;    seq = (seq+1)%(g_WindowWidth*2);    errNum = clSetKernelArg(tex_kernel, 0, sizeof(cl_mem), &g_clTexture2D);    errNum = clSetKernelArg(tex_kernel, 1, sizeof(cl_int), &g_WindowWidth);    errNum = clSetKernelArg(tex_kernel, 2, sizeof(cl_int), &g_WindowHeight);    errNum = clSetKernelArg(tex_kernel, 3, sizeof(cl_int), &seq);        size_t tex_globalWorkSize[2] = { g_WindowWidth, g_WindowHeight };    size_t tex_localWorkSize[2] = { 32, 4 } ;    errNum = clEnqueueAcquireD3D10ObjectsKHR(commandQueue, 1, &g_clTexture2D, 0, NULL, NULL );    errNum = clEnqueueNDRangeKernel(commandQueue, tex_kernel, 2, NULL,                                    tex_globalWorkSize, tex_localWorkSize,                                    0, NULL, NULL);    if (errNum != CL_SUCCESS)    {        std::cerr << "Error queuing kernel for execution." << std::endl;    }    errNum = clEnqueueReleaseD3D10ObjectsKHR(commandQueue, 1, &g_clTexture2D, 0, NULL, NULL );    clFinish(commandQueue);    return 0;}

Use opencl kernel computing to generate the content of a D3D texture object:

__kernel void xyz_init_texture_kernel(__write_only image2d_t im, int w, int h, int seq ){    int2 coord = { get_global_id(0), get_global_id(1) };    float4 color =  {                       (float)coord.x/(float)w,                      (float)coord.y/(float)h,                      (float)abs(seq-w)/(float)w,                      1.0f};    write_imagef( im, coord, color );}

This texture is written to the kernel using the write_imagef () function. Seq is a serial number variable. Each frame on the host increases cyclically and is sent to the kernel. In the kernel, seq variables are used to generate texture color values. When seq increments, the color changes to achieve texture animation.

In addition, a rendering technology g_pTechnique is used in the code. This is a basic processing pipeline that uses a simple Vertex coloring tool to pass vertices and texture coordinates to a pixel coloring tool:

//// Vertex Shader//PS_INPUT VS( VS_INPUT input ){    PS_INPUT output = (PS_INPUT)0;    output.Pos = input.Pos;    output.Tex = input.Tex;            return output;}technique10 Render{    pass P0    {        SetVertexShader( CompileShader( vs_4_0, VS() ) );        SetGeometryShader( NULL );        SetPixelShader( CompileShader( ps_4_0, PS() ) );    }}

This technology uses the conventional D3D10 call for loading. The pixel shader then performs texture search on the texture modified by the OpenCL kernel. The ratio is as follows:

SamplerState samLinear{    Filter = MIN_MAG_MIP_LINEAR;    AddressU = Wrap;    AddressV = Wrap;};float4 PS( PS_INPUT input) : SV_Target{    return txDiffuse.Sample( samLinear, input.Tex );}

SamLinear is a linear sampler of input textures in the pixel shader. For each iteration of the rendering loop, OpenCL updates the texture content in computeTexture (), and D3D10 displays the updated texture.

5. Process D3D vertex data in OpenCL

A D3D buffer containing vertex data is used to draw a sine curve on the screen. First, define a simple structure for the vertex buffer in D3D:

struct SimpleSineVertex{    D3DXVECTOR4 Pos;};

You can create a D3D10 buffer for this structure. The buffer contains 256 elements:

bd.Usage = D3D10_USAGE_DEFAULT;bd.ByteWidth = sizeof( SimpleSineVertex ) * 256;bd.BindFlags = D3D10_BIND_VERTEX_BUFFER;bd.CPUAccessFlags = 0;bd.MiscFlags = 0;hr = g_pD3DDevice->CreateBuffer( &bd, NULL, &g_pSineVertexBuffer );

Because we need to use OpenCL to set data in the buffer, we pass in NULL for the second parameter pInitialData and only allocate space. Once the D3D buffer g_pSineVertexBuffer is created, you can use the clCreateFromD3D10BufferKHR () function to create an OpenCL buffer from g_pSineVertexBuffer:

g_clBuffer = clCreateFromD3D10BufferKHR( context, CL_MEM_READ_WRITE, g_pSineVertexBuffer, &errNum );if( errNum != CL_SUCCESS){    std::cerr << "Error creating buffer from D3D10" << std::endl;    return E_FAIL;}

Similar to the previous one, g_clBuffer can be sent to an OpenCL kernel that produces data as a kernel parameter. In the sample code, the vertex position of the sine curve is generated in the kernel:

__kernel void init_vbo_kernel(__global float4 *vbo, int w, int h, int seq){    int gid = get_global_id(0);    float4 linepts;    float f = 1.0f;    float a = 0.4f;    float b = 0.0f;    linepts.x = gid/(w/2.0f)-1.0f;    linepts.y = b + a*sin(3.14*2.0*((float)gid/(float)w*f + (float)seq/(float)w));    linepts.z = 0.5f;    linepts.w = 0.0f;    vbo[gid] = linepts;}

During rendering, set the layout and buffer, and specify a Line Band. Next, computeBuffer () calls the previous kernel update buffer. Activate a simple rendering pipeline and draw up 256 data points:

// Set the input layoutg_pD3DDevice->IASetInputLayout( g_pSineVertexLayout );// Set vertex bufferstride = sizeof( SimpleSineVertex );offset = 0;g_pD3DDevice->IASetVertexBuffers( 0, 1, &g_pSineVertexBuffer, &stride, &offset );// Set primitive topologyg_pD3DDevice->IASetPrimitiveTopology( D3D10_PRIMITIVE_TOPOLOGY_LINESTRIP );computeBuffer();g_pTechnique->GetPassByIndex( 1 )->Apply( 0 );g_pD3DDevice->Draw( 256, 0 );

During running, the program will apply this kernel to generate the texture content, then run the D3D pipeline to sample the texture and display it on the screen. Then a vertex buffer is drawn to obtain a sine curve on the screen.


Source code:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.