CPU and GPU implementations Julia

Last Update:2015-05-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CPU and GPU implementations Julia
The main objective is to learn how to write Cuda programs by contrast. Julia's algorithm is still a certain difficulty, but not the focus. Since the GPU is also an image recognition program, the default is to combine with OpenCV. First, CPU implementation (JULIA_CPU.CPP) Julia_cpu using the CPU to implement the Julia transform

#include"StdAfx.h"
#include<iostream>
#include"OPENCV2/CORE/CORE.HPP"
#include"OPENCV2/HIGHGUI/HIGHGUI.HPP"
#include"OPENCV2/IMGPROC/IMGPROC.HPP"
using namespaceStd
usingnamespaceCv
#DefineDIM 512
structCucomplex
{
floatR
floatI
Cucomplex (floatAfloatb): R (a), I (b) {}
floatMagnitude2 (void){returnR*r+i*i;}
Cucomplexoperator*(Constcucomplex& a)
{
returnCucomplex (R*A.R-I*A.I,I*A.R+R*A.I);
}
Cucomplexoperator+(Constcucomplex& a)
{
returnCucomplex (R+A.R,I+A.I);
}
};
intJuliaintXintY
{
Constfloatscale = 1.5;
floatJX = scale* (float) (DIM/2-X)/(DIM/2);
floatJY = scale* (float) (DIM/2-y)/(DIM/2);

Cucomplex c ( -0.8,0.156);
Cucomplex A (JX,JY);
 for(inti=0;i<200;i++)
{
A=a*a +c;
if(A.magnitude2 () >1000)
{
return0;
}
}
return1;
}
int_tmain (intARGC, _tchar* argv[])
{

Mat src = Mat (DIM,DIM,CV_8UC3);//Create Canvas
 for(intx=0;x<src.rows;x++)
{
 for(inty=0;y<src.cols;y++)
{
 for(intc=0;c<3;c++)
{
Src.at<vec3b> (x, y) [C]=julia (x, y) *255;
}

}
}
Imshow ("src", SRC);
Waitkey ();
return0;
}

The implementation here is mainly to illustrate the Julia algorithm, which itself is a recursive, and has a certain computational complexity of the algorithm. Second, GPU implementation in order to have a deep understanding of the technology here, I did a series of experiments. It is important to note that GPU compilation is very slow and that there is no way to speed it up. In addition, the more troublesome is the reading of the matrix read, because the current lack of information, so many things still unclear. 1) CUDA and OpenCV; (TEST1.CU)Cuda is mainly to do mathematical operations, it itself and OPENCV no inevitable connection. In general, the calculation itself is in Cuda, while the OPENCV writes the relevant transformations to show the results. The function here is to read a monochrome image and invert all pixels. Writing the code, or based on the existing template, to adjust the parameters, so that the fastest, based on the existing data constantly adjusted, it can also control errors. Note that in the Cuda kernel, you cannot use any of the OPENCV functions. At the moment I can only achieve this effect, because the majority of groups how to introduce, it is necessary to find more information. The main is the operation of the array, now only to do the singular group, once the multidimensional overflow.

1) Cuda and OpenCV linked together; (TEST1.CU)
#include"StdAfx.h"
#include<iostream>
#include"OPENCV2/CORE/CORE.HPP"
#include"OPENCV2/HIGHGUI/HIGHGUI.HPP"
#include"OPENCV2/IMGPROC/IMGPROC.HPP"
#include<stdio.h>
#include<assert.h>
#include<cuda_runtime.h>
#include#includeusingnamespaceStd
usingnamespaceCv
#DefineN 250
Test1 's kernel
__global__voidTest1kernel (int*T)
{
intx = blockidx.x;
inty = blockidx.y;
intoffset = x+y*griddim.x;
T[offset] =255-t[offset];

}
intMainvoid)
{
Step0. Data and Memory initialization
Mat src = imread ("Opencv-logo.png", 0);
Resize (src,src,size (n,n));
int*dev_t;
intT[n*n];
Mat DST = Mat (N,N,CV_8UC3);
 for(inti=0;i<n*n;i++)
{
T[i] = (int) src.at<Char> (i/n,i%n);
}
Checkcudaerrors (Cudamalloc (void* *) &dev_t,sizeof(int) (*n*n));
Step1. Importing data from the CPU to the GPU
Checkcudaerrors (cudamemcpy (dev_t, T,sizeof(int) (*n*n, Cudamemcpyhosttodevice));
STEP2.GPU operations
Dim3 grid (N,N);
Test1kernel<<<grid,1>>> (dev_t);
Step3. Transferring data from the GPU to the CPU
Checkcudaerrors (cudamemcpy (t, dev_t,sizeof(int) (*n*n, cudamemcpydevicetohost));
Step4. Displaying results
 for(inti=0;i<n;i++)
{
 for(intj=0;j<n;j++)
{
intoffset = i*n+j;
 for(intc=0;c<3;c++)
{
Dst.at<vec3b> (I,J) [C] =t[offset];
}
}
}
STEP5, freeing up resources
Checkcudaerrors (Cudafree (dev_t));
Imshow ("DST", DST);
Waitkey ();
return0;
}

2) Cuda calculates Fibonacci numbers, thinking about the implementation of CNN; is cuda suitable forFibonacci, like Julia, each point is independent, it is suitable; if you can separate some blocks, it should be appropriate. Therefore, a singleFibonacci Operations do not work, but it is valuable to do an array, and to operate on the idea of parallelism. The results do not support recursion, so you should pay attention to this when computing the design later. Parallel design is never a simple problem, there must be a very steep learning curve, need to be rich experience, there are very big market. However, CNN really is a typical implementation, it does not need a serial operation, but after a large number of parallel results, choose a best parameter, so CNN can be used as the image domain and cuda combination of a typical implementation. 3) Cuda realizes Julia. On the basis of the previous, very smooth

3) Julia
#include"StdAfx.h"
#include<iostream>
#include"OPENCV2/CORE/CORE.HPP"
#include"OPENCV2/HIGHGUI/HIGHGUI.HPP"
#include"OPENCV2/IMGPROC/IMGPROC.HPP"
#include<stdio.h>
#include<assert.h>
#include<cuda_runtime.h>
#include#includeusingnamespaceStd
usingnamespaceCv
#DefineN 250
structCucomplex
{
floatR
floatI
__device__ Cucomplex (floatAfloatb): R (a), I (b) {}
__device__floatMagnitude2 (void)
{
returnR*r+i*i;
}
__device__ Cucomplexoperator*(Constcucomplex& a)
{
returnCucomplex (R*A.R-I*A.I,I*A.R + r*a.i);
}
__device__ Cucomplexoperator+(Constcucomplex& a)
{
 returnCucomplex (R+A.R,I+A.I);
}
};
__device__intJuliaintXintY
{
Constfloatscale = 1.5;
floatJX = scale* (float) (N/2-X)/(N/2);
floatJY = scale* (float) (N/2-y)/(N/2);
Cucomplex c ( -0.8,0.156);
Cucomplex A (JX,JY);
 for(inti=0;i<200;i++)
{
A=a*a +c;
if(A.magnitude2 () >1000)
{
return0;
}
}
 return1;
}
__device__intFBLX (intOffset
{
if(Offset ==0 | | offset==1)
{
returnOffset
}
Else
{
return(FBLX (offset-1) +FBLX (offset-2));
}
}
Test3 's kernel
__global__voidJuliakernel (int*T)
{
intx = blockidx.x;
inty = blockidx.y;
intoffset = x+y*griddim.x;
intJuliavalue = Julia (x, y);
T[offset] =juliavalue*255;
}
intMainvoid)
{
Step0. Data and Memory initialization
int*dev_t;
intT[n*n];
Mat DST = Mat (N,N,CV_8UC3);
 for(inti=0;i<n*n;i++)
{
T[i] = 0;
}
Checkcudaerrors (Cudamalloc (void* *) &dev_t,sizeof(int) (*n*n));
Step1. Importing data from the CPU to the GPU
Checkcudaerrors (cudamemcpy (dev_t, T,sizeof(int) (*n*n, Cudamemcpyhosttodevice));
STEP2.GPU operations
Dim3 grid (N,N);
Juliakernel<<<grid,1>>> (dev_t);
Step3. Transferring data from the GPU to the CPU
Checkcudaerrors (cudamemcpy (t, dev_t,sizeof(int) (*n*n, cudamemcpydevicetohost));
Step4. Displaying results
 for(inti=0;i<n;i++)
{
 for(intj=0;j<n;j++)
{
intoffset = i*n+j;
printf ("%d is%d", Offset,t[offset]);
 for(intc=0;c<3;c++)
{
Dst.at<vec3b> (I,J) [C] =t[offset];
}
}
}
STEP5, freeing up resources
Checkcudaerrors (Cudafree (dev_t));
Imshow ("DST", DST);
Waitkey ();
return0;
}

Three, summary Cuda programming is a new field, although the document is said to be uncomplicated, not complex, but want to large-scale application can not be complex. So let's start with the existing examples and run things that can run up. Then think about merging and forming your own stuff, which is productivity. I believe that without a lot of time, I would be able to use Cuda's computational functions to touch and resolve something that I could not have done before. I wish you success and wish to review.

From for notes (Wiz)

CPU and GPU implementations Julia

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

CPU and GPU implementations Julia

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support