Nuts and bolts of applying deep learning

Source: Internet
Author: User
Tags disqus

Kevin Zakka ' s blogaboutnuts and bolts of applying deep learning

Sep 26, 2016

This weekend is very hectic (catching up on courses and studying for a statistics quiz), but I managed-squeeze in some Time to watch the Bay area deep learning School livestream on YouTube. For those of your wondering what's is, Badls are a 2-day conference hosted at Stanford University, and consisting of the back -to-back presentations on a variety of topics ranging from NLP, computer Vision, unsupervised learning and reinforcement L Earning. Additionally, top DL software libraries were presented such as Torch, Theano and TensorFlow.

There were some super interesting talks from leading experts on the Field:hugo Larochelle from Twitter, Andrej karpathy F Rom OpenAI, Yoshua Bengio from the Universitéde Montreal, and Andrew Ng from Baidu to name a few. Of the plethora of presentations, there is one somewhat non-technical one given by Andrew, really piqued my interest.

In this blog post, I ' m gonna try and give a overview of the main ideas outlined in he talk. The goal is to pause a bit and examine the ongoing trends in deep learning thus far, as well as gain some insight into app Lying DL in practice.

By the the-the-the-livestreams, you can still view them at the Following:day 1 and Day 2.

Table of Contents:

    • Major Deep Learning Trends
    • End-to-end Deep Learning
    • Bias-variance Tradeoff
    • Human-level Performance
    • Personal Advice

Major Deep Learning Trends

Why does DL algorithms work? According to Ng, with the rise of the Internet, Mobile and IOT era, the amount of data accessible to us have greatly increa Sed. This correlates directly to a boost in the performance of neural network models, especially the larger ones which has the Capacity to absorb all this data.

However, in the small data regime (left-hand side of the x-axis), the relative ordering of the algorithms are not so well Defined and really depends on who are more motivated to engineer their features better, or refine and tune the Hyperparame Ters of their model.

Thus This trend are more prevalent in the big data realm where hand engineering effectively gets replaced by End-to-end app Roaches and bigger neural nets combined with a IoT of data tend to outperform all other models.

Machine Learning and HPC team. The rise of big data and the need for larger models have started to put pressure on companies to hire a computer Systems te Am. This is because some of the HPC (High-performance computing) applications require highly specialized knowledge and it is D Ifficult to find researchers and engineers with sufficient knowledge in both fields. Thus, cooperation from both teams are the key to boosting performance in AI companies.

categorizing DL models. Work in DL can is categorized in the following 4 buckets:

Most of the value in the industry today are driven by the models in the Orange blob (innovation and monetization mostly) BU T Andrew believes that unsupervised deep learning are a super-exciting field that have loads of potential for the F Uture.

The rise of End-to-end DL

A major improvement in the end-to-end approach have been the fact that outputs is becoming more and more complicated. For example, rather than just outputting a simple class score such as 0 or 1, algorithms is starting to generate richer O Utputs:images like in the case of GAN ' s, full captions with RNN's and most recently, audio like in DeepMind ' s wavenet.

So what exactly does end-to-end training mean? Essentially, it means that AI practitioners is shying away from intermediate representations and going directly from one End (raw input) to the other end (output) Here's an example from speech recognition.

Is there any disadvantages to this approach? End-to-end approaches is data hungry meaning they only perform well when provided with a huge dataset of labelled example S. In practice, not all applications has the luxury of large labelled datasets so other approaches which allow hand-engin eered Information and field expertise to being added into the model has gained the upper hand. As an example, in a self-driving car setting, going directly from the raw image to the steering direction is pretty diffic Ult. Rather, many features such as trajectory and pedestrian location is calculated first as intermediate steps.

The main take-away from this section is, we should always be cautious of end-to-end approaches in applications where h Uge data is hard-to-come by.

Bias-variance Tradeoff

splitting your data. In the learning problems, train and test come from different distributions. For example, suppose you is working on implementing an AI powered rearview mirror and has gathered 2 chunks of data:the First, larger chunk comes from many places (could be partly bought, and partly crowdsourced) and the second, much smaller Chunk is actual car data.

The splitting the data into train/dev/test can tricky. One might is tempted to carve the dev set out of the the training chunk like in the first example of the diagram below. (Note that the chunk on the left corresponds to data mined from the first distribution From the second distribution.)

This is a bad because we usually want our dev and test to come from the same distribution. The reason for this because a part of the team would be spending a lot of time tuning the model to work well on the Dev set, if the test set were to turn out very different from the dev set, then pretty much all the work would has been Wasted effort.

Hence, a smarter to splitting the above dataset would is just like the second line of the diagram. Now on practice, Andrew recommends creating dev sets from both data distributions:a Train-dev and Test-dev set. In this manner, any gap between the different errors can help you tackle the problem more clearly.

Flowchart for working with a model. Given What are the described above, here's a simplified flowchart of the actions you should take when confronted with Trai Ning/tuning a DL model.

The importance of data synthesis. Andrew also stressed the importance of data synthesis as part of a workflow in deep learning. While it is painful to manually engineer training examples, the relative gain in performance you obtain once the Param Eters and the model fit well is huge and worth your while.

Human-level Performance

One of the very important concepts underlined in this lecture is that of human-level performance. In the basic setting, DL models tend to plateau once they has reached or surpassed human-level accuracy. While it's important to note this human-level performance doesn ' t necessarily coincide with the Golden Bayes error rate, It can serve as a very reliable proxy which can be leveraged to determine your next move when training your model.

reasons for the plateau. There could is a theoretical limit on the dataset which makes further improvement futile (i.e. a noisy subset of the data) . Humans is also very good at these tasks such trying to make progress beyond, suffers from diminishing returns.

Here's an example, can help illustrate the usefulness of human-level accuracy. Suppose you is working on an image recognition task and measure the following:

    • Train error: 8%
    • Dev Error: 10%

If I were to tell you that human accuracy for such a task was on the order of 1 and then this would be a blatant bias problem And you could subsequently try increasing the size of your model, train longer etc. However, if I told you that human-level accuracy is on the order of 7.5% and then this would is more of a variance problem a nd you ' d focus your efforts in methods such as data synthesis or gathering data more similar to the test.

By the There's always-on-the-improvement. Even if you is close to human-level accuracy overall, there could is subsets of the data where you perform poorly and wor King on those can boost production performance greatly.

Finally, one might ask what is a good the defining human-level accuracy. For example, with the following image diagnosis setting, ignoring the cost of obtaining data, how should one pick the Criter IA for human-level accuracy?

    • Typical human: 5%
    • General Doctor: 1%
    • Specialized Doctor: 0.8%
    • Group of specialized doctors:0.5%

The answer is always the best accuracy possible. This is because, as we mentioned earlier, Human-level performance are a proxy for the Bayes optimal error rate, so Providin G A more accurate upper bound to your performance can help you strategize your next move.

Personal Advice

Andrew ended the presentation with 2 ways one can improve his/her skills in the field of deep learning.

    • practice, practice, practice: Compete in Kaggle competitions and read associated blog posts and forum discussions .
    • Do the Dirtywork: Read a lot of papers and try to replicate the results. Soon enough, you ll get your own ideas and build your own models.
Comments Powered byDisqus
    • Kevin Zakka ' s Blog
    • Kevinzakka
    • Kevin_zakka

Academic Journal

(EXT) Nuts and bolts of applying deep learning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.