The characteristics of heterogeneous program development determine whether development is different from traditional development methods. For this project, this chapter lists several important points worth noting and guides the entire process from the development process to ensure program quality + and optimization. The entire development process is briefly described for the development of heterogeneous programs and your own Development Business.
The process is described as follows:
Process 1: Data Preparation
Prepare the raw data of the business to be processed. For example, if your data source is MySQL, app, MongoDB, or other, it is usually used for testing, I will write a function that produces floating point numbers immediately to simulate my project.
Process 2: Business Logic Design
For more business-required functions, the design of business-layer classes and portfolio classes generally have four functions, each of which is directly dependent. The software product generated in this process is a class diagram.
Process 3: business logic implementation
It refers to the interface implemented in the CPU and can be called by other apps. I suggest encapsulating the parallel and non-parallel transaction logic in this service class, if there is a parallel processing module, it will be processed in the next software process. The software products generated in this process are. h and. cpp of the class. I always remind myself that I am not eager to write the kernel program of the parallel module.
Process 4: Data Dictionary Design
Why is it wrong to put this process in this place, because the data is stored in the database from the database, and finally the computed data is stored in the database. This entire process involves things, it should not be placed in this place. As shown in the figure, the data dictionary always runs through.
However, this process makes some sense, because a data block is put into a GPU for parallel computing and needs to be copied from the device, a good data type, it is of great significance for the bandwidth and memory used by devices and hosts. Simply put, no one would like to copy a group of string strings that are meaningless to the GPU and use them only as the IDs that indicate a computing result, right. Therefore, data dictionary design is also an iterative process. The data dictionary found during development can be optimized as much as possible!
The principle of data dictionary design is that the GPU of devices is the service object, and the principle of devices is favored.
What are the important points? The design of data dictionaries is very important in heterogeneous development. We do not seek to be in place in one step, but to improve.
Process 5: Kernel Program Design
The kernel program is a parallel computing program developed on the GPU.
In process 3, if a function module finds that the parallel granularity is large, we need to start to do really meaningful things.
To emphasize the clarity of the program architecture --
We will first establish. the cuh file declares the functional modules for Parallel Computing. Note that the business functions in process 3 only need to include this. the cuh file can call the encapsulated parallel computing module.
Next, create the. Cu file. Note that all kernel operation symbols must be implemented in the Cu file. We implement kernel functions in the Cu file to process parallel data.
Yes, we should not write too many header files or more header files in the kernel program. It is very helpful for the program architecture and engineering!
Process 6: Iterative Optimization
There are two optimizations:
First, we should never forget whether the business can be further optimized;
Second, we are most looking forward to kernel program algorithm optimization.
Maybe the second one is that we will have more challenges. In a very simple example, my sorting algorithm is higher and faster than your sorting algorithm; or, my program uses less memory than your program and has a large amount of instructions. When developing a kernel program, we should try not to waste the kernel resources, and eliminate the possibility of memory out-of-bounds and exposure. The consequence is not a software crash, but a blue screen of the system!
In the last article, there is no addition in the figure, that is, every version, we store records and analyze the efficiency, as a phased product of our own optimization program. I tried to finalize each version from CPU to GPU, so that the entire process would be interesting. I can see that each version is being upgraded and will be very proud. Yes, originally --
I want to remind myself that heterogeneous programs, step by step, one version, one by one comparison, stable and improved efficiency and quality!