CUDA Threading Execution Model analysis (i) Recruiting---GPU revolution

Last Update:2017-02-27 Source: Internet

Author: User

Tags execution thread

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Preface: Perhaps when you see the following content, you will feel that the traditional threads of instruction, and some of the books on the computer is not very much the same content. I think the contents of computer, programming and so on are not all abstruse and difficult to understand, in fact, it is also very simple. Always thought that computer programming is like building blocks as a child, as long as you know the rules of the game, how to play on your own. Perhaps from the primary school that would like to do math problems with some simple methods to solve problems, develop some habits, like to put complex questions will try to use the simplest amount of method to solve, and do not like to make a simple problem to complicate. No more said, some friends have been impatient ... PS: Another word, if the following do not understand, when the novel looked, if feel not like a novel, then when the story, if you think the story is incomplete, write too messy, then as a joke, in your study work can win everyone a smile, but also feel honored. PS2: Want to say ... Suddenly thought of, indeed is a period of time again think of, since called GPU Revolution, that must gather the team Ah, I began to recruiting.

Business:

In order to get into the Cuda parallel development, we must understand the Cuda's running model before we can develop the parallel program on this basis.

Cuda is executed by letting one of the host's kernel perform on the graphics hardware (GPU) according to the concept of thread grid (GRID). Each thread grid can also contain multiple line Cheng (block), each of which can contain multiple threads (thread).

Here we can take the ancient army as an example to understand the program execution model here. Every thread, like every soldier of ours, knows not what to do until he is a soldier. When to carry out a certain large military mission, the general issued an order, we come to the opposite enemy forces of the N enemy destroyed. Then the team into the M part, each part of the completion of their own work, some do reconnaissance work, some do decoy work, some do ambush work, some do back-up work, some do logistics work ... Anyway put a big task according to different categories, different processes are different, by the M part to complete.

Here we can see the general is host, it is the military action into a kernel:kernel_1,kernel_2......kernel_m, each kernel to each grid (lieutenant?). thousand households? In the custody of many people less, if the GPU hardware support a little bit, that is thousands of households; if the GPU hardware higher, more people management, then the lieutenant? Qi's troops also but four thousand or five thousand people, I also can't too greedy, all of a sudden want to juntong million, and then, dare to come out on the same generalship and so many? ) to complete. When it comes to performing these tasks, each grid divides the task into parts, after all, there are too many people, he is not a official, he only has to manage a few teams among senior officers. Grid also divided the task into a block (hundred households?) Here, every grid managed block is also limited, (people are so much ...). How much you want to see the hardware support). After all, the GPU hardware on the graphics card is still very small, thread (threads) compared to the real army, people are still a lot less. So when you go to block this layer, you directly manage every thread (soldier).

Since the ancient communication is not very convenient (from the development of the GPU history, if according to China's history, now the GPU is still in the Warring States era ...). ), so each block (hundred) internal thread (soldier) can be convenient communication, according to the established rules for synchronization, and each block is not so convenient, we can not communicate with each other. However, the same (thousand-user) grid-managed block is a resource that shares the same task assignment. Each grid can be assigned to a number of tasks from the general, and some food, the same grid block can be divided into the grid allocated to the food. and each (thousand households) grid itself task is not the same, so the grid in addition to know what they do, other grid he will not know. ----This is almost a running model. Below let's take a look at the GPU Middle East Legend Description:

See this picture, we can correspond to explain our thread unit. A general host, assigned two tasks in the task (Kernel1, Kernel2) to the Thousand Households (GRID1,GRID2) to complete. Thousands of Grid1 inside their own team into the 6个百 block, and then each hundred households and the task assigned to their own soldiers (Thread) to concrete completion. Here is to explain, because the thousand households get the task kernel is fixed, so to every soldier (Thread) there will only be immersed in doing the same thing (like Jiguang recruit soldiers: in Huzongxian's staff Zhengjo had the "Jiangnan somewhat", with such a detailed prospectus, if not convinced, Can go to the contrast: where the election into the army, the following few people are not available, in the marketplace mixed people can not use, like the Huaquanxiutui people can not use, the age of 40 people can not use, in the government agencies have done people can not use. The above is the second, more magical requirements are still below: like bragging, talk about the people can not use, the courage of the people can not use, the white people can not use, in order to ensure the psychological health of the team, the character of the extreme (prejudice stubborn) people can not use. ...... To sum up, Jiguang to find is such a group of people: the limbs developed, simple-minded, honest, law-abiding obedience to the government, dare to tough, dare not die, with erlengzi character of the muscle male. ----The things of the Ming Dynasty).

In order to facilitate unified management, everyone removed their name, according to Grid1,block (x,y), thread (x,y) such as the number to call every thread soldier. If you want to find a thread, you run to the barracks and shout, "Hey, Grid1 's Block1 three-row second thread (1,2)." For each soldier himself, he had to know his place, and he had to know who his officers were. Thread (1,2) to know oneself again the entire grid under the number of soldiers (steel seven years ... soldier), when Grid1 called his number, he had to answer right away: I am in block (1,1) The number is:

unsigned int xindex = blockdim.x * blockidx.x + threadidx.x;

unsigned int yindex = BLOCKDIM.Y * blockidx.y + threadidx.y;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More