A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Deep Learning (Deepin learning) has swept the world in the past two years, the driving role of big data and high-performance computing platform is very important, can be described as deep learning "fuel" and "engine", GPU is engine engine, basic all deep learning computing platform with GPU acceleration. At the same time, deep learning has become a new strategic direction for the GPU provider Nvidia, as well as the March GTC 2015 's absolute protagonist.
So, what's the latest development of GPU for deep learning? What are the implications of these developments for the deep learning framework? How should deep learning developers play the potential of the GPU? What is the future of GPU and deep learning, and what are the trends in technology? At the recent nvidia Deep learning China Strategy Conference, NVIDIA global vice president, PSG and cloud computing business China general manager Ashok Pandey led his in-depth learning related management team to receive interviews with reporters on Nvidia's deep learning strategy, technology, ecology, A detailed explanation of the market-related issues.
Nvidia believes that data, models and GPUs are now driving deep learning, and deep learning users can choose from different computing platforms, but developers need an easy-to-deploy platform and a good ecosystem, including some hardware-optimized open source tools, and a good deep learning computing ecosystem, It is the existing advantage of the GPU, but also Nvidia's consistent purpose.
Nvidia Global vice president, PSG and cloud computing business China Regional General manager Ashok PandeyWhy is GPU in tune with deep learning?
With the increase of data volume and computing power, the large-scale neural network of Hinton and LeCun has been used for many years, and the performance and learning precision of deep learning has been greatly improved, which is widely used in text processing, speech and image recognition, not only by Google, Facebook, Baidu, The use of giants such as Microsoft has also become the core competency of start-ups such as the ape-bank and the technology.
So why the GPU? The most important thing is that the GPU's outstanding floating-point computing performance specifically improves the two key activities of deep learning: Classification and convolution performance, while achieving the desired accuracy. Nvidia says deep learning requires a high degree of intrinsic parallelism, a lot of floating-point computing power, and a matrix budget, which the GPU can provide, and with the same precision, faster processing, less server input, and lower power consumption in the same way as traditional CPUs.
Performance comparison using GPU acceleration versus CPU-only training for CNN
Taking the Imagenet competition as an example, based on the GPU accelerated deep learning algorithm, Baidu, Microsoft and Google's computer vision system in the Imagenet image classification and recognition test respectively reached 5.98% (January 2015 data) 4.94% (February 2015 data), 4.8% (February 2015 data), the error rate, close to or exceed the level of human recognition-running sub-race although there is a specific optimization of the known data set is suspected, but the results of the optimization of industry still has a reference value.
"Artificial intelligence has been transformed from a model-based approach to a data-based, statistical-based approach that relies heavily on high-speed, high-level GPU-parallel architecture. It turns out that GPUs are good for deep learning. "Professor of Beijing University of Aeronautics and Astronautics, national" 25,873 Program of high-efficiency computer and application services environment "Major project overall team leader Prof. Depei Qian said.4 New Solutions
Nvidia reviewed four new products and solutions that helped drive deep learning in GTC:1. GeForce GTX TITAN X, a GPU developed for training deep neural networks.
Using the NVIDIA Maxwell GPU architecture, the TITAN x combines 3,072 processing cores with a single-precision peak performance of 7 teraFLOPS, plus onboard 12GB memory, 336.5gb/s bandwidth to handle millions of of the data used to train deep neural networks.
Nvidia introduced that TITAN X spent less than three days on the industry standard Model AlexNet, using 1.2 million ImageNet image datasets to train the model, while using a 16-core CPU took more than 40 days.2. DIGITS Devbox, a deep learning tool for researchers in the form of Table edge.
The DIGITS Devbox uses four TITAN X GPUs, optimized for each component from memory to I/O, with pre-installed software to develop deep neural networks including: DIGITS software packages, three popular deep learning architectures Caffe, Theano and Torch, as well as NVIDIA's full GPU accelerated Deep Learning Library CuDNN 2.0. Like other giants, Nvidia's support for open source is spared.
Nvidia says that in critical deep learning tests, the DIGITS Devbox can provide 4 times times the performance of a single TITAN X. Using DIGITS Devbox to train AlexNet can be done in 13 hours, while using one of the best single GPU PCs is two days, and it takes more than one months to simply use the CPU system.3. The next-generation GPU architecture Pascal will accelerate the computational speed of deep learning applications by 10 times times faster than Maxwell.
Pascal introduces three designs that significantly speed up training, including: 32GB of video memory (2.7 times times of GeForce GTX TITAN X) for mixed-precision computing tasks that can be calculated at twice times the rate of 32-bit floating-point accuracy under 16-bit floating-point accuracy; equipped with 3D heap Stack memory, allowing developers to build larger neural networks, increase the speed performance of deep learning applications by up to 5 times times, and with NVIDIA's high-speed interconnect technology Nvlink to connect more than two GPUs, increasing the speed of deep learning by up to 10 times times.
Nvidia said that in the field of deep learning is now generally used in single-precision, the future trend may be someone to use semi-precision, or even 1/4 precision, so Nvidia needs to adapt to the needs of the user's GPU architecture, Pascal support FP16 and FP32, can improve machine learning performance.4. Drive PX, a deep learning platform for autonomous vehicles.
Based on the Nvidia Tegra X1, combined with the latest PX platform, the car can make a qualitative leap in meter display and autonomous driving.Notable Nvlink and digits
When it comes to the 10 times-fold performance of the next-generation Pascal architecture, it has to say nvlink, which speeds data transfer between the GPU and GPU, between the GPU and the CPU, 5 to 12 times times faster than the existing pci-express standard, and for deep learning these applications that require higher GPU transfer speeds is a great boon. Developers should be pleased that Nvlink is based on point-to-point transmission, with the same programming pattern as pci-express.
Nvidia says that Nvlink can increase the number of GPUs in the system by one-fold to work together on deep-learning computing tasks, as well as connect CPUs and GPUs in new ways, providing more pci-e flexibility and power-saving performance in server design.
In fact, whether you want to do data parallelism or model parallelism, Nvlink brings more imagination space to deep learning developers. National speech recognition leader Iflytek, based on multi-GPGPU and InfiniBand built a circular parallel learning architecture for DNN, RNN, CNN and other model training, the effect is good, but the use of InfiniBand also let other practitioners envy its "local tyrants" behavior, If there is a nvlink, there is obviously another good way to do it.
Of course, wanting to use nvlink also means new investments, and Nvidia's existing product line supports deep learning, which users can choose to do as appropriate. For more in-depth knowledge of hardware selection, you can refer to the blog post written by Kaggle player Tim Dettmers: The full version of the Deep Learning hardware guide.
The other is digits, a multi-in-one graphics system designed, trained, and validated for image classification deep neural networks. DIGITS provides guidance to users during the installation, configuration, and training of deep neural networks with user interface and workflow management capabilities for loading training datasets from on-premises and on-network, and provides real-time monitoring and visualization capabilities, currently supported by GPU-accelerated version Caffe, see parallel Forall Blog: "Digits:deep learning Training System".
Digits first chose to support Caffe,nvidia, because their customer survey showed that the framework is currently the most popular (including domestic bat and some overseas users), and the CUDNN database is the first to be integrated into Caffe open source tools. Nvidia is committed to supporting mainstream open source tools, primarily the aforementioned Theano and torch, even if it does not cover all of the tools. Digits. NVIDIA's global team of digits and CUDNN has invested more than 30 people into open source work, and these developers have also maintained close communication with deep learning developers in the community.Chinese ecology
In Nvidia's view, the domestic deep learning research level and foreign institutions are basically equivalent, from the point of view of university research, the Chinese University of Hong Kong, CAS Automation has been imagenet good position, from industry, BAT, le Vision, Iflytek has a lot of young engineers and good research results in the field of deep learning. Nvidia hopes to strengthen the construction of China's ecological environment and promote the application of deep learning, the main ways still include the input of the open source community, university research cooperation, the cooperation of server manufacturers and enterprise user cooperation.
In January 2015, Nvidia and Iqiyi signed a framework agreement on deep cooperation between the two sides, which will work closely together in deep video and media cloud computing to build, share and serve Iqiyi video creation, sharing and service platforms using state-of-the-art GPUs and deep learning architectures. Nvidia said it will continue to work with key customers to establish joint labs in the future.
Deep learning enterprise with GPU accelerationGPU or a dedicated chip?
Although deep learning and artificial intelligence are hot in propaganda, the industrial application of deep learning is still the first step from the perspective of Bionics or statistics, and the theoretical foundation of deep learning has not been established and perfected, and it seems to some practitioners that relying on the accumulation of computational power and data sets to obtain results is too violent- For machines to better understand people's intentions, more data and a stronger computing platform are needed, and there are often supervised learning-and of course we don't have enough data to worry about at this stage. Is the future no longer dependent on data, no longer dependent on data tagging (unsupervised learning), no longer needing to compute power for performance and precision?
Step back, even if the computational force is still the necessary engine, is it necessarily GPU-based? We know that CPUs and FPGAs have demonstrated the ability to have deep learning loads, while IBM-led synapse giant neural network chips (brain-like chips) provide 1 million "neuron" cores, 256 million "synaptic" cores, and 4,096 "synapses" cores on 70 MW of power , even allowing the neural network and machine learning load to go beyond the von Neumann architecture, both energy and performance are sufficient to become GPU potential challengers. For example, in order to create a "flying brain," Iflytek, in addition to the GPU, also consider the use of deeply customized artificial neural network dedicated chip to create a larger scale of the hyper-count platform cluster.
Today, however, Nvidia does not worry that GPUs will fall out of favour in deep learning. First, Nvidia believes that the GPU, as the underlying platform, plays an accelerating role, helping deep-learning developers to train larger models faster, without being affected by the way deep learning models are implemented. Second, Nvidia said that users can choose different platforms according to demand, but deep learning developers need to excel in algorithms, statistics, need an ecological environment support, GPU has built cuda, cudnn and digits tools, support a variety of mainstream open source framework, Provides a friendly interface and visual approach, and is supported by partners, such as the wave developed a multi-GPU-enabled Caffe, dawning also developed a PCI bus-based multi-GPU technology, to be familiar with the serial programming of developers more friendly. In contrast, FPGA programmable chip or artificial neural network proprietary chip for the implant server and programming environment, programming ability requirements, but also lack of common potential, not suitable for popularization.
Article Source: http://www.csdn.net/article/2015-05-06/2824630
Deep learning "engine" contention: GPU acceleration or a proprietary neural network chip?
Start building with 50+ products and up to 12 months usage for Elastic Compute Service