Analysis of CUDA hardware implementation (i) The revolution of camping---GPU

Last Update:2017-02-27 Source: Internet

Author: User

Tags thread

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Preface: Have a friend who can't write computer program to read a blog, ask me, this GPU can also write as a story? I think perhaps the GPU is really a revolution, his development may be in the brewing, but by the end of 08, the beginning of 09, there will be a vigorous competition. At that time, perhaps from the OS level will bring people shock. If the CPU of the multi-core as a few special forces, each of the special forces are holding a 8-shot gun (SSE). Then the GPU can be seen as peasant uprising ... Come up with hundreds of people, although the individual combat capacity of the CPU than a single core, but after all, a large number. Now the performance of the GPU, in parallel operations, without considering the cost of double hardware, has already exceeded the CPU's parallel computing capacity. This may be a revolution, this revolution is not known as a simple GPU and CPU changes, but the parallel algorithm and serial algorithm competition. Although the study of parallel algorithms has been going on for many years now, the real application is far from the general public. But the GPU, parallel computing, all of a sudden, we have a lot closer to the parallel computation. Now in school to learn the computer is from the serial algorithm began, formed a lot of fixed serial thinking. When the problem is divided in parallel, there is a serial of ideas, it is not good:

Text: We have talked about some concepts of threads before, but these concepts are soft links. We often hear so-and-so units say how good their hardware and software configuration is. The software is good, each soldier is able to build, but if the hardware conditions can not keep up, nor their chivalrous place. As with some of the reasons for job-hopping in China, many people have changed jobs for high wages, but the actual statistics show that many people feel that in the past the company did not learn anything, or to play their own strengths in the place. It's up to the company to see if there's a chance for you to use your power. See here, pull far ~ Many people are impatient to see ... Book to the Last "CUDA Threading execution Model Analysis (ii) the army did not move the fodder first------the revolution of the GPU. Already talked about what the Cuda online model is like, after a few days of absorption, it should also have some impression in the brain. But you will ask a question, can we open countless threads to execute it? Maybe recruiting people want to find a lot of people, but you have to think about how much your food, how big the barracks. In this case we have to discuss now the hardware for the Nvidia graphics card that supports CUDA.

The empty explanation may still not be how persuasive, below take G80 as an example.

1. There are 16 multiprocessor inside the G80.

2. Each multiprocessor has a group (G80 inside is 8) 32 bit processor (each processor is SIMD architecture, what is called SIMD Architecture: When military training, everyone to the canteen, not like in the school inside, Everyone that their own chopsticks to eat 33 two, that can pay attention to discipline, what is called discipline, a group of people stand in front of the table, the company commander did not say, who also dare not sit down ... Commander command: "Sit Down". All the talents sit down and sit down at the same time--! If someone doesn't sit down in sync, it's a bad one--! One more time, must be synchronized sit down, can hear the sound, 咵! I wish the bench to sit broken------or troops of things strong, how the brothers sat to the end of military training only sit bad buttocks, did not hear the stool sit bad--! So, the treasure is to cherish their own. Remember, this is simd:single instruction multiple Data. There is also a shared instruction unit (this is not a fun translation is what it ~ look at the SIMD, you understand). There are two SFU modules inside the G80.

3. Each clock cycle, according to Warp (this play how to translate it?) It is understood that when running, a block inside the thread running together, for example, block contains 512 thread, but each time only 32 thread is running, then this 32 thread is a running warp group-! Fortunately not rap--! I just can't explain it.

4. Each warp contains a limited number of thread counts and is now set at 32. Do not know if there will be changes in the future? I don't know, it's only cuda developers know.

We still learn according to our established way ~ Look at the picture and talk--! And then there's another picture:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More