Deep learning, NLP and characterization (translation: Wizards) _

Deep learning, NLP and characterization (translation: Wizards) __NLP

Last Update:2018-08-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction of recursive neural network in Tan Yin-layer neural network word embedding and sharing the criticism conclusion thanks

From: https://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
Posted on July 7, 2014
Neural network, depth learning, characterization, NLP, recursive neural network Introduction

In the past few years, deep neural networks have dominated pattern recognition. They surface the previous artistic state for many computer vision tasks. Speech recognition is also evolving in this way.

But, despite the results, we have to wonder why they work so well.

This article reviews some of the most significant results of applying deep neural networks to natural language processing (NLP).

In doing so, I hope to give a hopeful answer that explains why deep neural networks can work. I think this is a very elegant angle of view. single hidden Layer neural network

A neural network with a hidden layer is universal: given enough hidden elements, it can approximate any function. This is a frequently cited-even more frequently misunderstood and applied-theorem.

Indeed, this is essentially because the hidden layer can be used as a lookup table.

For simplicity's sake, let's consider a sensor network. A perceptron is a very simple neuron that emits a signal when it exceeds a threshold and does not signal if it is not reached. The Perceptron network obtains binary (0 and 1) inputs and gives binary output.

Please note that the number of possible inputs is limited. For each possible input, we can construct a neuron in the hidden layer to excite the input, and only on that particular input. We can then use the connection between the neuron and the output neuron to control the output in a particular case.

Therefore, a hidden-layer neural network is indeed universal. But there is nothing particularly impressive or exciting about it. Saying that your model can do the same thing as looking up a table is not a very strong argument. This simply means that your model is not impossible to complete the task.

Versatility means that a network can adapt to any training data you give it. This does not mean that it will be interpolated in a reasonable way to the new data point.

No, versatility does not explain why neural networks are so effective. The real reason seems to be some more subtle .... To understand it, we first need to understand some concrete results. Word embedding

I want to start with a particularly interesting depth study: Word embedding. In my personal opinion, although they were originally presented by Bengio and others more than 10 years ago, they are still one of the most exciting areas of study in depth learning. In addition, I think they get intuition about why deep learning is so effective is one of the best places.

One word embedding w:words→ℝn w:w o r d s→r n W: \mathrm{words} \to \mathbb{r}^n is a parameterized function that maps words in some languages to high dimensional vectors (possibly 200-500 dimensions). For example, we might find that:

W (' cat ") = (0.2,-0.4, 0.7, ...) W (' cat ") = (0.2,-0.4, 0.7, ...) W (' \text{cat}\! ') = (0.2,~ \text{-}0.4,~ 0.7,~ ...)
W (' mat ') = (0.0, 0.6,-0.1, ...) W (' mat ') = (0.0, 0.6,-0.1, ...) W (' \text{mat}\! ') = (0.0,~ 0.6,~ \text{-}0.1,~ ...)

(Typically, a function is a lookup table, parameterized by a matrix Θθ\theta, with one row for each word: wθ (WN) =θn wθ (W N) =θn w_\theta (w_n) = \theta_n)

W W is initialized to each word with a random vector. It learns to have a vector of meaning to perform certain tasks.

For example, one task we can train a network for is to predict whether a 5-gram (five-word sequence) is "valid." We can easily get a lot of 5-grams from Wikipedia (for example, "cat sat on the mat") and then replace half of them with a random word (for example, "Cat sat song The mat"), because it almost certainly makes our 5-gram meaningless.

Our training model will run every word in 5-gram to get a vector representing it through w W, and input these vectors into another "module" named R R, which attempts to predict whether the 5-gram is "valid" or "interrupted." So, we want to:

R (W (' Cat '), W (' Sat '), W ('

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep learning, NLP and characterization (translation: Wizards) __NLP

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Deep learning, NLP and characterization (translation: Wizards) __NLP

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support