A literary understanding of GPU Computing

Source: Internet
Author: User
Keywords deep learning algorithm artificial intelligence medical programming gpu recommendation engine neural network

Everyone may have heard of Alex Krizhevsky, who designed the first true deep neural network AlexNet in human history during his Ph.D.—a total of eight learning layers, containing 60 million parameters. His mentor, Hinton (known as the "father of neural networks"), did not support his research as a doctoral thesis, because the calculations at that time were based on the calculation of the CPU, and the training of such neural network models was a few Months of time; then you have to manually adjust the parameters, and then re-training, so that you want to get a reliable neural network model, it takes about several dozen times; when you are lucky, more than ten times of training, you may have to spend For decades. But Alex, as a typical Geek, did not give up. In addition to learning mathematics, he also learned a lot of programming-related knowledge, including CUDA.

CUDA is a parallel computing platform and programming model created by NVIDIA. It leverages graphics processing unit (GPU) capabilities to achieve significant performance improvements. NVIDIA was launched in 2006 with CUDA. Since then, the stock has climbed from the initial $7 to more than $260.

Alex reprogrammed his model with CUDA, then bought two very powerful graphics cards, the GTX580, and spent six days training AlexNet, constantly tuning and perfecting. Later, he participated in the ImageNet contest led by Li Feifei and won the championship of the year. AlexNet was able to achieve image recognition accuracy at the time, far ahead of the second place. After the contest, Alex and his mentor, Hinton, founded the company, which was acquired by Google for $400 million in a few months. This is a story of GPU-rich, and it can be seen that the first combination of GPU and deep neural networks created a value of $400 million.

After that, we experienced an era of a major outbreak of neural network models during the Cambrian period. Before 2012, although people have been studying, but there is not enough computing power to support these algorithms, but the emergence of new computing methods GPU Computing, support the training of the same type of neural network model; thus contributing to the outbreak of various models Growth, and then into the era of artificial intelligence.

Nowadays, you can use Caffe, TensorFlow, Theano and other open source deep learning platforms to implement your own algorithms, or you can program on CUDA. The head companies in the field of artificial intelligence research, the algorithm models they recommend now have reached a fairly complex level, a model can reach the scale of 1 T or even a few T, including billions or even billions of parameters, data volume It is even more imaginable. Such a model is more difficult to train. Therefore, the three calculations are so entangled, promote each other and promote each other.

Everyone knows the famous Moore's Law. The content is that when the price is constant, the number of components that can be accommodated on an integrated circuit will double every 18-24 months, and the performance will double. In other words, the performance of computers that can be bought for every dollar will more than double every 18-24 months. This law reveals the speed of advances in information technology. However, according to OPEN AI's calculation at the beginning of this year, from the emergence of AlexNet to about 5 years at the end of last year, we have increased the demand for computing power by 300,000 times in the training level of artificial intelligence models.

We all know that in the first 25 years of Moore's Law, we achieved 10 times performance in 5 years and 100,000 times in 25 years. This is the increase in computing power that Moore's Law has brought us in the CPU era. But this is not enough for the artificial intelligence model's demand for computing power. Therefore, in order to meet the demand for such computing power, we constantly polish our technology at the GPU level to improve the performance of all aspects. On this basis, we also see more and more people starting to train their own models based on CUDA. Google, Facebook, etc. also build their own open source deep learning platform based on CUDA.

NVIDIA launched the HGX-2 platform and the DGX-2 server based on HGX-2 at the GPU Technology Conference in March 2018. It is a high-density, high-performance electronic product with excellent thermal performance. At the heart of the DGX-2 architecture is the NVSwitch memory structure. Essentially, the NVSwitch architecture creates a huge shared memory space of 512 GB for GPU nodes, achieving nearly 2 Petapflops on TensorCore with 10 kW of power.

The so-called GPU Computing is not a single piece of hardware. How to apply these calculations to artificial intelligence algorithms and practical application scenarios is the focus of most people. Everyone mentioned that Nvidia may feel like a chip company, but in fact, our company has a total of 12,000 people worldwide; 11,000 of them are engineers. Among these engineers, 7,000 are software engineers. Together to build and improve the artificial intelligence ecosystem based on GPU Computing.

At present, the application scenarios of artificial intelligence are concentrated in the Consumer Internet, represented by BATJ and TMD in the country, and the United States is mainly Fangjia, Apple, Microsoft and Netflix. These companies are the first pioneers in the field of artificial intelligence. They have invested a lot of money in this field, accumulated a lot of computing power, recruited the most famous doctors in the industry to their company, and their every service is on a daily basis. A large amount of data (DAU, Daily Active User), so a large amount of data is collected. At the 2018 Create Baidu Developer Conference, Li Yanhong mentioned the concept of an Intelligent Chasm, which can be understood as a smart gully. It is said that compared with the computing power and data accumulated by these head companies, the computing power of all other companies in the world is added up. It may be just the same size as them, or even worse. This gap between computing power and data is like the same.

So how to make these seemingly high artificial intelligence algorithms and the more expensive computing power, as well as the data that is difficult to obtain, is easier, this is what we have done in the past and what we will do in the next time. .

Taking TensorRT as an example, NVIDIA TensorRT is a high-performance neural network inference engine for deploying deep learning applications in production environments with image classification, segmentation and target detection to provide maximum inference throughput. And efficiency. TensorRT is the first programmable reasoning accelerator to accelerate existing and future network architectures. With the tremendous acceleration of TensorRT, service providers can deploy these compute-intensive artificial intelligence workloads at an affordable cost.

1. AI industry case sharing

In addition to the Internet, artificial intelligence is commonly used in autonomous applications, such as autonomous driving, medical, telecommunications, and so on.

Recommendation engine

In the past, people were looking for information, but now they are turning to information to find people. Everyone may have used small video apps such as fast hands or vibrato. These small videos are backed by neural network algorithms. While you are using a recommendation engine, there may be dozens of models that are evaluating you. Five years ago, you may just be sensing and perceiving your needs. Now you are evaluating them from various dimensions, balancing in many ways, not only It’s attractive to click, and you have to stay long enough; the algorithms that attract clicks and attract people are very different.

Almost all the big Internet companies in China are training their own recommendation models to achieve thousands of faces. Recommendations are very important for these companies, because the realization of the Internet is almost always related to recommendations, e-commerce categories needless to say, food categories such as domestic fast, vibrato, foreign Netflix, Hulu, information such as Google news, today's headlines, There are also music classes, social classes and more. The user's use in turn provides the company with new data that can be used to train more efficient models. This aspect enhances the user experience, but on the other hand may result in users not being able to leave these products.

2. Medical

A large proportion of the members of the NVIDIA Startup Acceleration Program are artificial intelligence + medical projects. A major challenge in medical projects is diagnosis. At present, it is still difficult to diagnose through deep learning, but the market is still very large. According to the data in some related reports, for some chronic disease diagnosis, after using the deep learning algorithm to assist, the accuracy can be increased by 30%-40%, while the cost is reduced by half.

Take retinal scanning as an example. It is often said that the eyes are the windows of the mind. In fact, the eyes are also the windows of the body. The retinas of the human eye are rich in capillaries. By scanning the retina, some problems on the human body can be detected, such as the secondary disaster of diabetes. One is the lesion of the retina, as well as cardiovascular disease.

In China, there are fewer doctors who can diagnose by retinal scan; and at home, some doctors cannot diagnose. Through deep learning techniques, the experience of these doctors can be collected to aid diagnosis. At present, this technology is still difficult to apply to hospitals, but some insurance companies are very willing to use this technology to obtain some information about the probability of the customer's illness, thus assisting in the development of the policy amount.

3. Self-driving

In order to carry out the development of autonomous driving, Nvidia has its own server farm. This server farm has 1000 DGX-1s with one E (1E=1024P=1024*1024T) floating-point computing capability for training in autopilot models. When a car runs outside for one day, it will generate the amount of data on T, and the amount of data on P may be one year. But even then, collecting data by simply getting on the road is not enough. According to estimates, autonomous vehicles must run at least 100,000 miles to barely meet the standards of the road. For now, the rate of auto-driving vehicles is not high. Google's self-driving vehicles are about a few thousand miles and need to support the steering wheel. The rest is basically the same.

Our current practice is to take the model in the real car to the server farm, let him train in the highly simulated simulation environment in the server, generate new data during the training process, and then use the data to train the new one. Model. In this way, attempts are made to accelerate the training of the self-driving vehicle model.

After sharing the AI ??application scene, the presenter Zhao Liwei also introduced NVIDIA's new Quadro RTX, which can help the game and film industry achieve real-time ray tracing and rendering. Finally, he concluded with Nvidia's new office building "Endeavor" and "Voyager" in Silicon Valley, expressing NVIDIA's ongoing efforts in the field of artificial intelligence and looking forward to the vision of artificial intelligence technology to lead humans into the unknown.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.