Learn Cuda together (0)

Source: Internet
Author: User

1.Nvidia Why Cuda is introduced
Recently the laboratory has a lot of Hao in the beginning of books, because genius majority, so we generally will say to the graphics card requirements are not high, do not play large game, just CPU can not poor, Intel I7, 3G of the main frequency ...

In fact, the CPU clock frequency 4GHz is already basic limit, at this limit point, the CPU will generate a lot of heat, the cause of the heat is due to the increase of clock frequency, power consumption increases. In fact, in the case of constant voltage, the power consumption of a CPU is about 3 times of its clock frequency, and worse, if the CPU generates more heat, then even if the clock frequency is constant, depending on the characteristics of silicon, the CPU power consumption will also increase, this increasing invalid power consumption, means that you either cannot fully power the processor, or can not effectively cool the processor, has reached the thermal limit of the electronic device or chip package, that is, the power Wall effect.

On the other hand is the demand for faster processors on the market, so two major PC CPU manufacturers, Intel and AMD, have taken a multi-core approach, from a continuous increase in clock frequency to the development path of adding more cores to the processor.

However, no matter what the industry, the transformation of the road is very difficult, like the road is accustomed to, suddenly to take the waterway, there will always be a person drowned, or learn to swim. The problem with multicore is that the problem solving method of serial and single-threaded problem solving method is changed to multi-thread parallel execution. Related to thread allocation, memory sharing and so on, maybe your program runs on a dual-core computer, and a quad-core machine hangs up, so the shift has been slow. In fact, many people use a four-core machine running a single-threaded application, at most in a nuclear work, some devices will dynamically increase the clock frequency to improve performance, so many of the four-core machine is empty its table, a lot of hardware resources are wasted.

Figure 1:cpu and GPU Peak Performance (units: 1 billion-time floating-point operations per second gigaflops)

Multi-core CPU development of the long-distance, on the other hand, if you pay attention to the GPU and CPU computing power, 1, the GPU has started to throw off the CPU a few streets, the current CPU is difficult to reach 4GHz clock frequency, the number of cores is difficult to exceed 16 cores, Computing power is about 64gigaflops (1 billion floating-point operations per second), and 2 of the GPU computing power is far more than the CPU, if you can move the GPU's brain, may be another road.

Figure 2: Current nvidia GPU Card performance parameters

In 2007, Nvidia discovered an opportunity to get the GPU into the mainstream, with the introduction of Cuda (Compute Unified Device Architecture Computing Unified Appliance Architecture), which added an easy-to-use programming interface to the GPU, Cuda is an extension of the C language that allows for GPU programming using standard C, which has evolved quite rapidly in recent years due to Cuda's normative and general usability, becoming the first candidate programming language likely to evolve into GPU development.
2. In-depth understanding of the GPU
The GPU computing power is so strong that it is naturally closely related to its internal principles. The hardware structure of the GPU differs fundamentally from the hardware structure of the CPU, and Figure 4 shows a GPU system on the other side of the PCI-E bus.

Figure 3:core2 Series CPU structure diagram

Figure 4:gpu Card's composition module diagram
The hardware portion of the GPU consists of the following key modules:
1) memory (global, constant, shared)
2) stream processor cluster SM
3) Stream Processor SP
With regard to the concurrency of GPUs, the connotation of concurrency is that, for a particular problem, there is no need to consider which parallel computation to solve, but only to focus on what operations in the solution method can be executed in parallel. Because "easy parallelism" does not require or requires only a small amount of thread or thread-to-block communication, Cuda is an ideal parallel solution platform that supports inter-thread communication with the display-based communication primitives of the on-chip resources. CUDA decomposes the problem into a thread block of meshes, each containing multiple threads, and blocks can be executed in any order, but at some point in time only a subset of the blocks are in execution, and once dispatched to one of the N "stream processor clusters" included in the GPU, a block must be executed from the beginning to the end, Figure 5 Represents a GPU-based thread.

Figure 5: GPU-based threads trying to

Learn Cuda together (0)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.