Opencl learning step by step (7) grayscale image histogram computing (1)

Source: Internet
Author: User

Histogram is translated into a histogram. In computer image processing and visual technology, histogram is usually used for image matching to complete the track. For example, in the meanshift tracking algorithm, the histogram of the image is often used.

For the histogram Calculation of a grayscale image, you must first select the number of Bin (which can be called a slot in Chinese). For a grayscale image, the pixel range is usually [0-255], so the number of bin is 256, then we cycle the entire image, calculate the number of occurrences of each pixel value, and put it in the corresponding bin. For example, Bin [0] places the number of pixels with a gray value of 0 in the entire image, and bin [1] places the number of pixels with a gray value of 1 in the entire image ......

The following histogram is the histogram corresponding to the grayscale image lenna.

The CPU computing of the grayscale histogram is very simple. Define an array hostbin [256], Initialize all array elements to 0, and then cycle the entire image to obtain the histogram. The Code is as follows:

// Histogram of CPU
Void cpu_histgo ()
{
Int I, J;
For (I = 0; I {
For (j = 0; j <width; ++ J)
{
// Printf ("data: % d \ n", data [I * width + J]);
Hostbin [DATA [I * width + J] ++;
// Printf ("hostbin % d = % d \ n", data [I * width + J], hostbin [DATA [I * width + J]);
}
}
}

How to Use opencl to calculate grayscale images is not that easy. We know that the advantage of GPU is parallel computing. How to partition images to calculate histograms in parallel is the focus of our discussion. The following is a thread of a 512*512 image, which is divided into workgroups:

We set the image width to an integer multiple of bins, that is, a multiple of 256, and the height to a multiple of workgroup size (set to 128 in this program, if the image height and width are not multiples of bins and workgroup size, we use the following formula to convert the image width and height to their multiples:

// Width is an integer multiple of binsize and height is an integer multiple of groupsize.
Width = (width/binsize? Width/binsize: 1) * binsize;
Height = (height/groupsize? Height/groupsize: 1) * groupsize;

Then, 512*512 of images can be divided into 8 work groups. Each workgroup includes 128 threads, and each thread computes a histogram of 256 pixels, put the results in the local memroy space corresponding to the thread. Before the kenrel code ends, combine the histograms of all threads in a workgroup to generate a histogram of the workgroup blocks, and finally at the host end, merge the histograms of eight workgroup blocks to generate the final histogram.

There are three memory objects in opencl. databuffer is used to input image data, while mindevicebinbuf is in the size of workgroup Number * 256, that is, each workgroup corresponds to one bin, and the second parameter of the other kernel function, its size is workgroup size * 256, which is used for each thread in the workgroup to store its own 256 pixel histogram results.

// Create two opencl memory objects
Databuf = clcreatebuffer (
Context,
Cl_mem_read_only,
Sizeof (cl_uchar) * width * height,
Null,
0 );

// This object stores the histogram results of each block.
Middevicebinbuf = clcreatebuffer (
Context,
Cl_mem_write_only,
Sizeof (cl_uint) * binsize * subhistgcnt,
Null,
0 );

...

Status = clsetkernelarg (kernel, 1, groupsize * binsize * sizeof (cl_uchar), null); // local memroy size, LDS for AMD

The following shows how the kernel code calculates the histogram of the workgroup block.

_ KERNEL
Void histogram256 (_ global const uchar * data,
_ Local uchar * sharedarray,
_ Global uint * binresult)
{
Size_t localid = get_local_id (0 );
Size_t globalid = get_global_id (0 );
Size_t groupid = get_group_id (0 );
Size_t groupsize = get_local_size (0 );

The following code initializes the local memory corresponding to each thread, that is, the count in the corresponding 256 bin is cleared. The sharedarray size is workgroup size * 256 = 128*256

// Initialize the shared memory
For (INT I = 0; I <bin_size; ++ I)
Sharedarray [localid * bin_size + I] = 0;

You can use barrier to set the synchronization points of all threads in a workgroup to ensure that all threads are initialized.

Barrier (clk_local_mem_fence );

The following code calculates the histogram of 256 pixels in the thread. For example, for Thread 0 in workgroup 0, the 256 pixels it calculates are partial green pixels. Note: the pixels contained by each thread are not consecutive.


// Calculate the thread Histogram
For (INT I = 0; I <bin_size; ++ I)
{
Uint value = (uint) data [groupid * groupsize * bin_size + I * groupsize + localid];
Sharedarray [localid * bin_size + value] ++;
}
Fence ensures that each thread completes histogram calculation.
Barrier (clk_local_mem_fence );
The following is a histogram of each thread to form a histogram of the entire workgroup pixel block. Each thread merges two bins, such as Thread 0 and bin0 and bin128.


// Merge the histograms of all threads in the workgroup to generate the workgroup histogram.
For (INT I = 0; I <bin_size/groupsize; ++ I)
{
Uint bincount = 0;
For (Int J = 0; j <groupsize; ++ J)
Bincount + = sharedarray [J * bin_size + I * groupsize + localid];

Binresult [groupid * bin_size + I * groupsize + localid] = bincount;
}
}

At the host end, we need to combine the histograms of each workgroup block into the histogram of the entire image. The main code is as follows:

// Merge the sub-block histogram Value

For (I = 0; I <subhistgcnt; ++ I)
{
For (j = 0; j <binsize; ++ J)
{
Devicebin [J] + = middevicebin [I * binsize + J];
}
}

Complete code can be found:

Project File gcltutorial7

Download Code:

Http://files.cnblogs.com/mikewolf2002/gclTutorial.zip

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.