Extending deep learning a disguised Markov chain

Source: Internet
Author: User

You can create a recursive neural network with one character and one word Fu to learn the language features. But what is the difference between this result model and the Markov chain designed for the same purpose? This paper uses R to implement a character-character Markov chain to explore.

Andrej Karpathy's article "The Magic validity of recurrent neural networks" (the unreasonable effectiveness of recurrent neural Networks) was renowned last year. The basic assumption is that you can create a recursive neural network with one character and one word Fu to learn the language features. But what is the difference between this result model and the Markov chain designed for the same purpose? I used R to implement a character-character Markov chain to explore.

Source: @shakespeare
First, let's play a variant of the parody game, using Karpathy's Tiny Shakespeare DataSet (Tinyshakespeare DataSet) to generate text. Which fragments are from RNN, and which are from the Markov chain? It can be noted that Karpathy's example comes from the complete set, and my Markov chain comes from the tiny Shakespeare collection (about One-fourth of the former) because I'm lazy.

DUKE Vincentio:
Well, your wit are in the care of side and that.
Friar LAURENCE:
Or Walk liest;
and the ears.
and hell!
In self.
Petruchio:
Persuading to our the enemy, even woman, I ' ll afford show ' d and speaking of
England all off what least. Be satisfied! Now, sir.
Second Lord:
They would is ruled after this Chamber, and
My fair nues begun out of the fact, to be conveyed,
Whose noble souls I ' ll have the heart of the wars.
If you can't tell, you don't have to be demanding yourself. The humble Markov chain is as effective as the most advanced rnn in learning to spell (olde) English words. How is that possible? Let's see how these systems work. Both use the character sequence as input and try to "predict" the next character in the sequence. The RNN is implemented by adjusting the weight vector to get the output vector that fits the specified response. The hidden layer maintains state on the training set. Finally, a confidence value is computed for each possible output character, which is used to predict the next character.


Source: Andrej Karpathy
On the other hand, the training Markov chain simply constructs a probability density function, which gradually crosses the possible future state. This means that the resulting probability density function is not much different from the output confidence of the RNN. The following is an example of the probability density function spanning the character "walk":

> table (chain[[' walk '])/Length (chain[[' walk '])

A b i l m o u
0.4 0.1 0.1 0.1 0.1 0.1 0.1
This tells us that there are 40% possible character sequences "walk" followed by the letter "a". When generating text, we can use this as a predictive value, or using a probability density function to dictate the sampling. I chose the latter because it is more interesting.

But how is the state captured in the Markov chain? Because the Markov chain is stateless. Very simple: We use a sequence of characters instead of individual characters as input. In this article, I used a sequence of length 5, then the Markov chain selects the next state based on the previous 5 states. Is this cheating? Or is this the role of hidden layers in rnn?

Although the RNN mechanism is quite different from the Markov chain, the basic concepts are very similar. RNN and deep learning may be very cool in this area, but don't overlook simple things. You can learn a lot from simple models, which generally stand the test of time, well understood and easy to explain.

Note: I did not use the package to train and run the Markov chain because it is below the LOC. A version of this code will appear in a book I'm about to publish.

Extending deep learning a disguised Markov chain

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.