The approximate implementation principle of GPU

Source: Internet
Author: User



The meaning of matrix and matrix multiplication

The multiplication of matrices and matrices can be obtained from the multiplication of matrices and vectors, since a matrix is multiplied by multiple vectors, and these vectors can form a matrix (with some limitations). Alternatively, the matrix itself is a set of vectors with a permutation order, so a matrix multiplication can be seen as a combination of a matrix multiplied by a column vector (or a row vector multiplied by a matrix). For example:


If you put the column vectorCand theDcan be combined into a matrixB=(C,D), then the product of the above can be expressed as a product of two matrices:


Of course, if there are more vectors to combine, you can form such a matrix multiplicative:


Similarly, by multiplying a vector by the definition of a matrix, we can also define a matrix multiplicative of the inverse order, where the vector is the line vector:


Therefore, the geometric meaning of the multiplication of matrices and matrices can be obtained from the geometric meanings of the multiplication of matrices with multiple vectors, except that multiple vectors are combined in order to form another matrix.

matrix multiplied by matrices such asAB=CThe general geometrical meaning is to put one of the matrices asBthe geometry of several row vectors or column vectors is rotated, scaled, mirrored, and other transformations (another matrixAA number of new vectors, these new vectors form a new matrix as row vectors or column vectors .C, this new matrixCwill form a new geometry. For the following multiplicative type:


We give a specific example of the data and draw the graph of the transformation:




A small green maple leaf, after the summer of the natural transformation of the wind and rain baptism, finally became a red maple leaf.
  In fact, the multiplication of two matrices mainly investigates the transformation of one matrix to another. The matrix of its function is considered as an action matrix, and the matrix of the action can be regarded as the geometry of the row or column vectors. This understanding is the geometrical explanation given above.
———————————————————— I'm a split line ——————————————————————————————————

This is a cube. It takes a lot of steps to show a cube like this, so let's consider it simple, and imagine he's a wireframe. One more simplification, no wiring, is eight points (cubes have eight vertices). Then the question is simplified as to how to make these eight points turn up. First of all, when you create this cube, there must be eight vertex coordinates, which are represented by vectors, and therefore at least three-dimensional vectors. Then the "rotation" of the transformation, in linear algebra is represented by a matrix. The vector is rotated by multiplying the vector by this matrix. After the matrix multiplication above, I believe it is not difficult for us to understand. Turning these eight points is a multiplication of eight vectors and matrices. This kind of calculation is not complicated, apart from a few times the product is added together, that is, the computational volume is relatively large. Eight points will be counted eight times, 2000 points will be counted 2000 times. This is part of the GPU work, the vertex transformation, which is also the simplest part. There's a whole lot more trouble than that.

Most of the work of the GPU is like this, computationally large, but with little technical content and repeated many times. It's like you have a job that needs to be counted hundreds of millions of times. 100 within subtraction, the best way is to hire dozens of pupils together, a part of the calculation, anyway, these calculations are not technical content, purely physical life. And the CPU is like the old professor, the integral differential will be counted, is the high salary, an old professor Top 20 pupils, you if Foxconn you hire Which? The GPU is like this, with a lot of simple computational units to complete a lot of computational tasks, pure human sea tactics. This strategy is based on the premise that the work of primary school A and pupil B is not dependent on each other and is independent of each other. Many of the problems that involve a lot of computation are basically the same, such as what you mean by cracking passwords, mining, and many graphical calculations. These calculations can be broken down into the same simple small tasks, each of which can be assigned to a pupil. But there are some tasks that involve the "flow" problem. For example, you go blind Date, both sides look pleasing to the eye to continue to develop. Can not you have not met on this side, there will be someone to get the card. This more complex problem is done by the CPU.


Pictures from the NVIDIA Cuda documentation. The green is the calculation unit, orange red is the storage unit, Orange yellow is the control unit. Pictures from the NVIDIA Cuda documentation. The green is the calculation unit, orange red is the storage unit, Orange yellow is the control unit.

The GPU employs a large number of computational units and an ultra-long pipeline, but with very simple control logic and eliminates the cache. The CPU is not only a lot of space by the cache, but also has a complex control logic and many optimization circuits, in contrast, computing power is only a small part of the CPU.

So unlike the CPU, which is good at logic control and common type data operations, the GPU excels at massively concurrent computing, which is what password cracking requires. So GPU, in addition to image processing, is more and more involved in the calculation.

—————————————————————————————— I'm a split line ————————————————————————————————————

The GPU uses transistors for processor arrays, multi-threaded management, shared memory, memory controllers;

These designs are not designed to increase the execution speed of a single thread, but to enable the GPU to execute tens of thousands of threads at the same time (while the CPU is not available for time-slice scheduling, n cores can execute only n threads at the same moment, and if the sub-threads executing the same process differ from OS scheduling. N CPUs can only execute n processes at a moment ... PS: This has the opportunity to open the operating system when the chapter will also be a detailed pull on the

Implements inter-thread communication and provides extremely high memory bandwidth;

The GPU uses the cache to amplify memory bandwidth;

The GPU hides the delay by running thousands of threads at the same time, the thread waiting for memory access is switched off, and the GPU switching thread does not consume time;

For Cuda-enabled GPUs, each stream processor can handle 1024 threads at a time;

The cost of GPU switching threads is 0, in fact the GPU usually switches threads per clock cycle;

The GPU uses SIMT (single instruction multithreading), and the benefit of SIMT is that it does not require the developer to make the data into a suitable vector length, and SIMT allows each thread to have a different branch;

CUDA-enabled GPU integration has 8 memory controllers, and the GPU's memory bandwidth is typically 10 times times that of the CPU.

......


The article is I based on the online some enthusiastic netizens blog and my own some of the relevant information to summarize, if there is infringement please immediately tell me, if there is said wrong, also please pointing, thank you!


The approximate implementation principle of GPU

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.