Practice: use MATLAB for advanced GPU Programming

Source: Internet
Author: User

Can running on a GPU speed up my application?

GPU can accelerate applications that meet the following standards:

Large-scale parallel computing can be divided into hundreds or thousands of independent work units.

Computing-intensive computing consumes much more time than transferring data to the GPU memory or from the GPU memory.

Applications that do not meet the above standards may run slowly on the GPU than the CPU.

GPU programming using MATLAB

Over 100 built-in MATLAB functions are involved in FFT, IFFT, and linear algebra operations. By providing an input parameter of the GPUArray type (a special array type provided by the parallel computing Toolkit, these functions can run directly on the GPU. GPU-enabled functions are reloaded. In other words, these functions perform different operations based on different parameter types.

For example, the following code uses the FFT algorithm to find the Discrete Fourier transformation of the pseudo-random number vector on the CPU:

A = rand (2 ^ 16, 1 );

B = fft ();

To perform the same operation on the GPU, we first use the gpuArray command to transfer data from the MATLAB workspace to the GPU device memory. Then we can run the overload function fft:

A = gpuArray (rand (2 ^ 16, 1 ));

B = fft ();

The fft operation is performed on the GPU rather than on the CPU, because the input parameter (GPUArray) is located in the GPU memory.

Result B is stored in the GPU. However, B is still visible in the MATLAB workspace. By running class (B), we can see that B is a GPUArray.

Class (B)

Ans =

Parallel. gpu. GPUArray

We can use the GPU-enabled function to continue operations on B. For example, the plot command automatically processes GPUArrays for visualized operation results.

Plot (B );

To return data to the local MATLAB working set, you can use the gather command. For example

C = gather (B );

C is now a double in MATLAB and can be processed by all MATLAB function operations of the double variable.

In this simple example, executing a single FFT function saves less time than moving the vector from the MATLAB workset to the device memory. This is generally the case, but it also depends on the size of the hardware and array. The data transmission overhead may become so abnormal that it reduces the overall performance of the application, especially when you repeatedly exchange data between the CPU and GPU and perform relatively few computing-intensive operations. The more efficient way is to perform some operations on the data when the data is in the GPU, and only return the data to the CPU if necessary.

It should be noted that, like CPU, GPU memory is also limited. However, unlike CPUs, GPUs cannot exchange data between memory and hard disks. Therefore, you must verify that the data you want to keep in the GPU does not exceed the memory limit, especially when using large-scale matrices. You can run the gpuDevice command to query GPU cards and obtain information such as names, total memory, and available memory.

Wave Equation Solution Using MATLAB

To apply the above example to a specific environment, we implement the GPU function in a practical problem. The goal is to solve the second-order wave equation.

When u = 0, the critical value is reached. We use a spectral-based algorithm to solve the spatial equation and a second-order center finite difference method to solve the time equation.

The spectral method is usually used to solve partial differential equations. The solution using the spectral method is close to a linear combination of continuous basis functions such as sine and cosine. In this example, we have applied the cherbievow spectral method and used the cherbievow polynomial as the basis function.

We calculate the quadratic derivative of the current solution in the x and y quadrant using the cherbievp method at each time step. We also use these intermediate values with the old solution and the new solution, and use the second-order central finite difference method (also known as the froggle method) to calculate the new solution. We chose a time step to maintain the stability of the frogton method.

The MATLAB algorithm is computation intensive. When the number of elements in the grid exceeds the growth of the computing solution, the algorithm execution time will increase significantly. When a 2048x2048 grid is executed on a single CPU, it takes more than one minute to complete 50 time steps. It should be noted that our computing time has included the inherent multi-thread performance advantages of MATLAB. Since R2007a, some MATLAb Functions Support multi-threaded computing. These functions are automatically executed on multiple threads and do not need to display the specified command in the code to create a thread.

When considering how to use the parallel computing toolbox to accelerate computing, we will focus on the computing instruction code executed at each time step. Distance indicates the changes that need to be made to obtain algorithms running on the GPU. It should be noted that computing commands involving MATLAB operations, GPU-enabled heavy-load functions can be obtained from the parallel computing toolbox. These operations include FFT, IFFT, matrix multiplication, and element-wise operations. Therefore, we do not have to change the algorithm to execute on the GPU. You only need to use gpuArray to transfer data to the GPU before entering the cycle of the computing result of each time step.

 
Figure 3. The Code comparison tool shows the differences between the CPU version and the GPU version.

The Code shared by the CPU and GPU versions exceeds 84% (94 lines in the 111 row ).

After the computing command is executed on the GPU, the computing result is transferred from the GPU to the CPU. Each variable referenced by the GPU-enabled function must be created on the GPU or transferred to the GPU before use.

To convert a weight used for spectral differentiation to a GPUArray variable, we use

W1T = gpuArray (W1T );

Some types of arrays can be directly constructed on the GPU without being transferred from the MATLAB working set. For example, we use

Uxx = parallel. gpu. GPUArray. zeros (N + 1, N + 1 );

We use the gather function to migrate data from the GPU back to the MATLAB working set; for example:

Vvg = gather (vv );

It should be noted that this only transfers a piece of data to the GPU, and then transfers the data from the GPU back to the MATLAB working set. All computing commands of each time step are executed on the GPU.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.