Deep learning and Growing pains

Source: Internet
Author: User

Deep learning and Growing pains

"Editor 's note" Although deep learning has a great effect on the current development of AI, deep learning workers are not smooth sailing. Chris Edwards, published in the Communications of the ACM article, has identified some of the challenges and current solutions that deep learning faces in different scenarios, through the experience of different deep learning researchers. CSDN translation of this article, hope to the domestic deep learning practitioners have reference significance.

Advances in theory and computer hardware have prompted neural networks to become a core part of online services, such as Microsoft's Bing, which uses neural networks to drive image search and speech recognition systems. These companies offer the ability to be able to drive more advanced services in the future as they expand the neural network to handle more complex problems.

Neural networks have taken a long time to become a recognized part of the application of information technology, from the original idea of 50 ago. In the 90 's, after a flurry of interest, part of the support for the development of highly specialized integrated circuit design to overcome the shortcomings of traditional computer performance, neural networks in various algorithms, such as support vector machines in image processing and speech recognition in the Gaussian model.

The old simple neural network uses up to three layers of structure, divided into input layer, middle "hidden" layer and output layer. Neurons are highly correlated across layers. Each neuron feeds its output to each neuron in the next layer. The network is trained to iteratively adjust the data that each neuron applies to its input to minimize the error between the output of the entire network and the desired result.

Although neuroscience suggests that the human brain has deeper structures involving multiple hidden layers, early experimental results of such systems are worse than shallow networks. The work of Geoffrey Hinton and Ruslan Salakhutdinov at the University of Toronto in 2006 has led to a significant improvement in the research of the deep architecture. The training techniques they develop can be used more effectively in networks that contain multiple hidden layers. One of these techniques is "pre-training (pre-training)", which adjusts the output of each layer independently before moving to optimize the output of the entire network. This method allows the upper layer to extract advanced features that can be used to classify data effectively in the hidden layers below.

Even if the training has improved, scale is still a problem of deep learning. Fully interconnected neurons, especially at higher levels, require enormous computational power. The first layer used for image processing applications may require analysis of 1 million pixels. The number of multi-tier connections for a deep network will be several orders of magnitude. Dan Idsia, a researcher at Dalle Molle Institute AI Research Institute (CIRE?AN) based in Manno, Switzerland, said, "Every image has billions of or even hundreds of billions of connected receptions", Training such a large network requires billies (quadrillions) floating-point arithmetic.

Researchers such as Cire?an have found that non-traditional computer architectures can be used to speed up processing at scale. Graphics Processing units (GPUs) such as AMD and Nvidia provide the ability to perform hundreds of floating-point operations in parallel. previous efforts to speed up neural network training revolved around slower but easier-to-program cluster workstations. In an experiment in which a deep neural network trained to look for visual features of cell division, cire?an that the traditional CPU training phase took five months, "it took only three days on the GPU." ”

Yann LeCun, founder of Facebook's AI research and Data Science Center at New York University, said: "Previously, neural networks didn't break the record of identifying continuous voices; they weren't big enough. When people change to a Gaussian model of a deep neural network, the error rate goes down again. ”

According to their presentation, the depth neural network shows improvements of more than One-third, reducing the error rate of speech recognition with a small amount of background noise from 35% to less than 25%, and further optimizations are further improved.

There are some limitations to this form of learning. London-based DeepMind, a $400 million acquisition by Google in early 2014, uses computer games to evaluate the performance of deep neural networks in the face of different types of problems. Google researcher Volodymyr Mnih that the system is unable to handle situations like traversing a maze, where rewards are only released after successful completion of several stages. In these cases, when the network tries a variety of random initial operations and fails, basically nothing is learned. Deep neural networks perform better in games such as breakout and virtual Pinball, and these game achievements may be postponed, but they can be learned from random responses.

Deploying deep networks in commercial applications, the team turned to the design of custom computers, using field programmable gate Arrays (FPGAs). These implementations of the custom electronic circuitry use programmable logic lookup tables, hard-wired arithmetic logic units to optimize digital signal processing, and memory cell matrices to define how all these components are connected.

Baidu, the Chinese search engine and Web services company (using deep neural networks to provide speech recognition, image search, and contextual advertising), decides to use FPGAs instead of GPUs in server production. Jian Ouyang, a senior architect at Baidu, said that while individual GPUs provide peak floating-point performance, the FPGA consumes less power than the same performance level in the application of the deep neural network used by Baidu, and can be installed on the blade server, entirely by PCI connected to the motherboard Express bus power. one of the main advantages of FPGAs is that because a computed result can be fed directly to the next without having to temporarily save the primary storage, the memory bandwidth requirements are much lower than when using GPU or CPU implementations .

"With FPGAs, we don't need to modify the server design and environment, so it's easy to deploy on a large scale. We need many features to support those that cannot be deployed to FPGAs at the same time. However, we can use their reconfigurable on-demand move-in and move-out capabilities in the FPGA. The refactoring time is less than 10μs. "Ouyang said.

The Baidu team further saves space by using a simplified floating-point engine. "The standard floating-point implementation provided by the processor can handle all possible exceptions. In our case, however, we do not need to deal with situations other than all ieee[754] standards. ”

Also, trying to use a more efficient processor, researchers are trying to use distributed processing to build a broader deep learning network to cope with larger datasets. The waiting time transmitted over the network seriously affects the training speed. However, the training algorithm was re-organized with a transition from an Ethernet to an interconnected fabric that provides a lower latency, enabling the team from Stanford University to achieve near linear acceleration for multiple parallel GPUs in 2013. Using CPU clusters rather than GPUs in recent work, Microsoft has developed a training approach that relaxes synchronization requirements, allowing execution across thousands of of machines.

A more scalable network makes it possible for Baidu to implement an "end-to-end" speech recognition system called deep speech. The system does not rely on the output of traditional speech processing algorithms, such as the use of an implicit Markov model to improve the performance of noisy inputs. On a noisy data set, the error rate of word recognition is reduced to just over 19% compared to the 30.5% of the best commercial system error rate at the end of 2014.

However, preprocessing data and merging results from multiple smaller networks are more effective than relying purely on neural networks. Cire?an has combined the use of image warping and small network "committees" to reduce the error rate compared to larger single deep learning networks. In a test of traffic sign recognition, the combination of technologies produces better performance than human observers.

The decision to use distortions for a class of a given pattern requires manual intervention. Cire?an believes that the best combination of web-independent learning distortions is difficult, but it is often an easy decision for people to set up a system.

Neil Lawrence, a professor of computer science at Shffield University, argues that a potential problem with traditional deep learning is access to data . He says depth models tend to perform well when datasets are well represented and can be trained on a large amount of appropriate label data. "However, one of the areas that motivates me is clinical data, which is not the case. For clinical data, most of the time, most people do not have a large number of clinical trials to use. In addition, clinical trials are continuing to develop, just like patients ' illnesses. This is an example of a ' massive loss of data '. ”

Lawrence and others have suggested using the Gaussian process layer in probability theory instead of neural networks to provide effective learning for smaller datasets, and for applications where neural networks do not perform well, such as those that are interconnected across many different databases, which is a medical condition. because the data may not be rendered in some databases as a given candidate, the probability model handles this situation better than traditional machine learning techniques. The work is lagging behind neural networks, but researchers have begun to work on effective training techniques, as well as expanding processing to operate on platforms like multi-core GPU machines.

"We have an additional algorithm burden, that is, to spread uncertainty around the network," Lawrence said. "This is the beginning of the algorithmic problem, but also here, we've had most of the breakthroughs. ”

Lawrence that the Gaussian process-based deep learning system may require higher computational performance, but the system can automatically determine the internal needs of the network layer, which is based on the neural network system is currently not possible. "This type of structural learning is very exciting and is one of the original motivations for considering these models. ”

For a wider range of neural network systems, Ciresan mentions that removing more restrictions to build larger, more efficient models is ongoing, "but I want to say that what we want most is a better understanding of why deep learning can be achieved." ”

Original link: Growing Pains for deep learning (translator/Wang Wei Zebian/Zhou Jianding)

"Preview" The First China AI Congress (CCAI 2015) will be held in July 26-27th in Beijing Friendship Hotel. Machine learning and pattern recognition, big data opportunities and challenges, artificial intelligence and cognitive science, intelligent robotics four subject experts gathered. AI Product Library will be synchronized online, appointment consultation: qq:1192936057. Welcome attention.

This article for CSDN compilation, without permission not reproduced, if necessary reprint please contact market#csdn.net (#换成 @)

Deep learning and Growing pains

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.