Growing Pains for deep learning

Source: Internet
Author: User
Tags nets

Growing Pains for deep learning

Advances in theory and computer hardware has allowed neural networks to become a core part of online services such as Mic Rosoft ' s Bing, driving their image-search and speech-recognition systems. The companies offering such capabilities is looking to the technology-drive more advanced services on the future, as T Hey scale up the neural networks to deal with more sophisticated problems.

It has taken time for neural networks, initially conceived years ago, to become accepted parts of information Technolog Y applications. After a flurry of interest in the 1990s, supported in part by the development of highly specialized integrated circuits de Signed to overcome their poor performance on conventional computers, neural networks were outperformed by other algorithms , such as support vectors machines in image processing and Gaussian models in speech recognition.

Older simple neural networks with only up to three layers, split into an input layer, a middle ' hidden ' layer, and an OU Tput layer. The neurons is highly interconnected across layers. Each neuron feeds it output to each of the neurons in the following layer. The networks is trained by iteratively adjusting, the weights, each neuron applies to their input data to a try to Minimiz E The error between the output of the entire network and the desired result.

Although neuroscience suggested the human brain has a deeper architecture involving a number of hidden layers, the ResU LTS from early experiments to these types of systems were worse than for shallow networks. In 2006, work on deep architectures received a significant boost from work by Geoffrey Hinton and Ruslan Salakhutdinov at The University of Toronto. They developed training techniques that were more effective for training networks with multiple hidden layers. One of the techniques is ' pre-training ' to adjust the output of each layer independently before moving on to trying to op Timize the network ' s output as a whole. The approach made it possible for the upper layers to extract high-level features the could is used more efficiently to C Lassify data by the lower, hidden layers.

Even with improvements in training, scale presents a problem for deep learning. The need to fully interconnect neurons, particularly in the upper layers, requires immense compute power. The first layer for an image-processing application could need to analyze a million pixels. The number of connections in the multiple layers of a deep network would be the orders of magnitude greater. "There is billions and even hundreds of billions of connections that has to be processed for every image," says Dan Cire Researcher at the Manno, switzerland-based dalle Molle Institute for Artificial Intelligence (Idsia). Training Such a large network requires quadrillions of floating-point operations, he adds.

Researchers such as Cire?an found it is possible to use alternative computer architectures to massively speed up proce Ssing. Graphics processing Units (GPUs) made by companies such as AMD and NVidia provide the ability to perform hundreds of float Ing-point operations in parallel. Previous attempts to speed up neural-network training revolved around clusters of workstations that is slower, but which were easier to program. In one experiment in which a deep neural network is trained to look for characteristic visual features of biological cell Division, Cire?an says the training phase could has taken five months on a conventional CPU; "It took three days on a GPU."

Yann LeCun, director of Artificial intelligence in Facebook and founding director of New York University ' s Center For Data Science, says, "before, neural networks were not breaking records for recognizing continuous speech; They were not big enough. When people replaced Gaussian models and deep neural nets, the error rates went is down. "

Deep neural nets showed an improvement of more than a third, cutting error rates on speech recognition with little Backgro und noise from 35% to less than 25%, with optimizations allowing further improvements since their introduction.

There is limitations to this form of learning. London-based Deepmind-which was bought by Google in early for million-used computer Mance of deep neural networks on different types of problems. Google researcher Volodymyr Mnih says the system cannot deal with situations such as traversing a maze, where the rewards Only come after successfully completing a number of stages. In these cases, the network had very little to learn from when it tries various random initial maneuvers but fails. The deep neural network fares much better at games such as Breakout and Virtual Pinball, where success may is delayed, but It can learn from random responses.

When it comes-deploying deep networks in commercial applications, teams has turned to custom computer designs using Field-programmable Gate Arrays (FPGAs). These implement custom electronic circuits using a combination of programmable logic lookup tables, hard-wired arithmetic Logic units optimized for digital signal processing, and a matrix of memory cells to define how all of these elements is Connected.

Chinese search-engine and Web-services company Baidu, which uses deep neural networks to provide speech recognition, IM The age searches, and to serve contextual advertisements, and the decided to the use of FPGAs rather than GPUs in production servers. According to Jian Ouyang, senior architect @ Baidu, although individual GPUs provide peak floating-point performance, in The deep neural network applications used by Baidu, the FPGAs consumes less power for the same level of performance and Cou LD be mounted on a server blade, powered solely from the PCI Express bus connections available on the motherboard. A key advantage of the FPGA is this because the results from one calculation can being fed directly to the next without Needi Ng to being held temporarily in main memory, the memory bandwidth requirement are far lower than with GPU or CPU implementatio NS.

"With the FPGA, we don ' t has to modify the server design and environment, so it's easy-to-deploy on a-large scale. We need many functions to is supported that is impossible to deploy at the same time in FPGA. But we can use their reconfigurability to move functions in and out of the FPGA as needed. The reconfiguration time is less than 10μs, "says Ouyang.

The Baidu team made further space savings by using a simplified floating-point engine. "Standard floating-point implementations provided to processors can handle all possible exceptions. Situation we don ' t need to handle all of the exceptions of the IEEE [754] standard. "

As well as finding ways to use more effective processors, researchers is trying to use distributed processing to build More extensive deep-learning networks The can cope with much larger datasets. The latency of transfers over a network badly affects the speed of training. However, rearranging the training algorithms together with a shift from Ethernet networking to Infiniband, which offers lo Wer latency, allowed a team from Stanford University on to achieve almost linear speedups for multiple parallel GPUs. In + recent work using clusters of CPUs rather than GPUs, Microsoft developed a-to relax the synchronization Requi Rements of training to allow execution across thousands of machines.

More scalable networks has made it possible for Baidu to implement a "end to end" speech recognition system called deep Speech. The system does not rely on the output of traditional speech-processing algorithms, such as the use of hidden Markov model S to boost it performance on noisy inputs. It reduced errors on word recognition to just over 19% in a Noise-prone dataset, compared to 30.5% for the best commercial Systems available at the end of 2014.

However, pre-processing data and combining results from multiple smaller networks can is more effective than relying purel Y on neural networks. Cire?an have used a COM bination of image distortions and "committees" of smaller networks to reduce error rates compared T O Larger single deep-learning networks. In one test of traffic-sign recognition, the combination of techniques resulted in better performance than human observers .

Researchers is trying to use distributed processing to build more extensive deep-learning networks The can cope with MUC H larger datasets.

Deciding on the distortions to use for a given class of patterns takes human intervention. Cire?an says it would be very difficult to has networks self-learn the best combination of the distortions, and that's it is Ty Pically an easy decision for humans and make when setting up the system.

One potential issue with conventional deep learning are access to data, says Neil Lawrence, a professor of machine learn ING in the computer department of the University of Sheffield. He says deep models tend to perform well in situations where the datasets is well characterized and can is trained on a l Arge amount of appropriately labeled data. "However, one of the domains that inspires me was clinical data, where this isn ' t. In clinical data, most people haven ' t had most clinical tests applied to them most of the time. Also, clinical tests evolve, as do the diseases that affect patients. This was an example of ' massively missing data. ' "

Lawrence and others has suggested the use of layers of Gaussian processes, which use probability theory, in place of neur Al networks, to provide effective learning on smaller datasets, and for applications in which the neural networks does not p Erform well, such as data, was interconnected across many different databases, which is the case in healthcare. Because data May is present in certain databases for a given candidate, a probabilistic model can deal with the Situat Ion better than traditional machine-learning techniques. The work lags behind in neural networks, but researchers has started work on effective training techniques, as well As scaling up processing to work on platforms such as MULTI-GPU machines.

"We carry an additional algorithmic burden, that's propagating the uncertainty around the network," Lawrence says. "This was where the algorithmic problems begin, but is also where we have had most of the breakthroughs."

According to Lawrence, Deep-learning systems based on Gaussian processes is likely to demand greater compute performance, But the systems were able to automatically determine how many layers was needed within the network, which is not currently Possible with systems based on neural networks. "This type of structural learning are very exciting, and was one of the original motivations for considering these models."

In currently more widespread neural-network systems, Ciresan says work are in progress to remove further limitations to Bui Lding larger, more effective models, ' but I would say ' what we would like mostly ' to has a better understanding of Why deep learning works. "

Back to Top

Further Reading

Hinton, G.E, and Salakhutdinov, R.R.
Reducing the dimensionality of data with neural networks, Science (2006), Vol 313, p 504.

Schmidhuber, J.
Deep learning in Neural networks:an overview, Neural Networks , Volume, pp85–117 (ArXiv preprint:http: Arxiv.org/pdf/1404.7828.pdf)

Mnih, V., et al
Human-level control through deep reinforcement learning, Nature, 518, pp529-533

Damianou a.c. and Lawrence N.D.
Deep Gaussian processes, Proceedings of the 16th International Conference on Artificial Intelligence and Statistics ( aistats). (ArXiv preprint:http://arxiv.org/pdf/1211.0358.pdf)

Back to Top

Author

Chris Edwards is a Surrey, u.k.-based writer who reports on electronics, IT, and synthetic biology.

Growing Pains for deep learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.