The main progress of the 2016 Deep Learning (translated from: The Major advancements in depth learning in 2016)

Source: Internet
Author: User
Tags generative adversarial networks

From: The Major advancements in deep learning in 2016 suggested reading time: 10 minutes

https://tryolabs.com/blog/2016/12/06/majoradvancementsdeeplearning2016/

In the past more than 10 years, deep learning has been the core topic, 2016 is no exception. This article reviews the technologies that they believe may be driving the development of this field or have made a significant contribution to the field. (1) Unsupervised learning has historically been one of the major challenges facing researchers. Due to the large number of production models, 2016 was a great year for the field. (2) In order for the machine to have the ability to communicate with people naturally is the ideal goal, some business giants, such as Google and Facebook, put forward some solutions. In this context, 2016 years is the innovation of the key issues in natural language processing around this goal.

(i) Unsupervised learning

Unsupervised learning refers to the extraction of patterns and structures from raw data without the need for additional information. This is opposed to supervised learning because it requires additional labeling information. The classical method of unsupervised learning based on neural network is autoencoder. The most basic version is composed of multilayer perceptron, the input and output layer has the same number of neurons, the number of neurons in the hidden layer is relatively small, use this to recover the input signal. Once the training is over, the output of the hidden layer is the representation of the corresponding data, which can be used for clustering, dimension normalization, improved supervised classification tasks, and even data compression.

In 2014, Ian Goodfellow proposed generative adversarial Networks (GAN) to address unsupervised learning. It was not until 2016 that the potential of the technology was highlighted. Gan is a real change. Its influence will be elaborated in this paper. Yann LeCun, one of the founders of deep learning, once said that Gan is the most important idea in machine learning in the last 20 years. The improved technology introduced in 2016 (deep convolutional GAN) corrected some of the shortcomings of previous structures and training algorithms, and the emergence of new applications revealed the power and flexibility of the GAN model and its improved version.

Gan originates from the intuitive thinking. Suppose there is a painter who forges a work of art as G, and some people make a living by identifying the authenticity of a painting as d. At the beginning, I showed d some Picasso paintings, and G faked some paintings to deceive D, making D believe that this was Picasso's painting. Sometimes G succeeds in deceiving D, but as D begins to learn Picasso's style by looking at some more paintings, G is getting harder to cheat D, so G is also going to forge better. As the study progresses, not only does D become more adept at distinguishing between paintings and Picasso, but G is also increasingly adept at counterfeiting. This is the thought behind Gan.

Technically, Gans is composed of a continuous push (confrontation) of two networks that generate network Generator (G) and discriminant network discriminator (D). Given a set of training samples, like, suppose their potential distribution is X. The GANS,G is responsible for generating the output, and D is responsible for judging whether the output is from the same distribution as the training sample. G starts with some noise z, generating the image g (z). D obtains images that are truly distributed and forged images from G, and can distinguish between them (d (x) and D (G (x)) as part of the training set. D and G learn at the same time, once G is well trained, it will be able to know enough knowledge about the distribution of training samples to produce new samples with some similar properties. These new samples may not correspond to the actual sample one by one, but they do capture some conceptual information that actually exists in the training sample. Take CIFAR10 as an example, from a distance, you can see there is a partial likeness.

Several variants based on Gan are described below:

(1)Infogan. Gan can not only approximate the distribution of data, but also can be used to learn the explanatory, useful data vector representation. (digression: Indeed, it is more understandable to use vector representations of data than representations of data.) The ideal vector representation is not only able to capture rich information, like Autoencoder, but also can be interpreted, that is, can distinguish which part of the vector is corresponding to which type of deformation will lead to the final output. OpenAI's fellows put forward a Infogan model for solving the problem in August. Infogan can use unsupervised methods to learn the representations of the data. Taking mnist data as an example, Infogan is able to infer the shape, rotation, and width of numbers without the need for additional manual labeling of data.

(2)Conditional Gans. Such models consider additional information, such as class tags, text, or other images, to produce samples that force g to produce certain types of output. Some applications emerge: a. Text-to-image. Use the text as an additional input, represented by a character level CNN or lstm encoded vector, and then generate the image based on this. B. Image-to-image. Maps an input image to an output image. C.super Resolution. Using the downsampled image as a sample, the generator G tries to approximate them to get a more natural and clear version.

(ii) Natural language processing

In order to have a smooth dialogue with the machine, some problems must be solved first: text comprehension, question and answer, translation.

(1) text comprehension . Salesforce Metamind builds a new model called Multi-Task Federation (Joint Many-tasks), with the goal of building a single model that can simultaneously accomplish the following five common NLP tasks: part-of-speech tagging, chunking, dependency analysis, semantic correlation, and text implication. The magic of this model is that it can be trained end-to-end (End-to-end trainable). This means that different tiers can be co-located, enabling the result of high-level (complex tasks) to improve the results of lower levels (simple tasks). In the past, it was thought that only the lower levels could be used to improve. Jmt thought happens to be reversed and therefore novel. Therefore, the result of the model's POS tagging is state-of-the-art.

(2) question and answer . Metamind also proposed a new model for solving question and answer questions called Dynamic coattention Network (DCN). The idea behind the model is also very intuitive. Suppose I give you a lengthy text and some questions (like reading comprehension), do you prefer to read the whole text first, then answer the question, or read the text first? Naturally, we should be more inclined to know the problem beforehand, as a condition, so that we can focus on these places when we read the text. Otherwise, we will have to pay equal attention to every detail and dependency in the entire text in order to deal with possible future problems. DCN is doing such a thing. First, it produces an internal representation of the text that is conditional on the problem, trying to answer the question, then browsing the list of possible answers and finally converging to the final answer.

(3) machine translation . In September, Google proposed a new model to be used by their translation system to become Google Neural machine translation (GNMT). This model trains a single model for each language, such as English and Chinese. The latest version of GNMT was released in November. In contrast to the primary version, more than one step, multiple language pairs of translation using a unified model to train. The only difference from the previous model is that there is one more input for specifying the target language. The new version of GNMT is capable of zero-shot translation, meaning that it can translate a pair of languages that have not been trained. The results show that it is better to train multiple languages simultaneously than to train a single language pair. This indicates that translation knowledge can be migrated from one language to another.

Reference documents

Generative adversarial Text to Image Synthesis, June 2016

Image-to-image translation with Conditional adversarial Nets, Nov. 2016

Photo-realistic single Image super-resolution Using a generative adversarial Network, Nov 2016

The main progress of the 2016 Deep Learning (translated from: The Major advancements in depth learning in 2016)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.