All code: Click here to view an example of tensorflow implementation of a simple two-yuan sequence can click here to view the basics of RNN and lstm can be viewed here This blog mainly contains the following training a RNN model literal character generates text data (last part) Using TensorFlow's scan function to implement DYNAMIC_RNN dynamically created effects using multiple rnn to create multi-tiered rnn to implement dropout and layer normalization functionality
I. Model description and data processing
1. Model DescriptionWe want to use RNN to learn a language model (language model) to generate a sequence of characters githbub the implementation of a good torch in others: HTTPS://GITHUB.COM/KARPATHY/CHAR-RNN Implementation in TensorFlow: Https://github.com/sherjilozair/char-rnn-tensorflow Next we look at how to implement
2. Data processingDataSet using a collection of Shakespeare's, click here to see, you can actually use other uppercase and lowercase characters as a different character to download and read data
1 2
3 4
5 6 7 8 |
' Download data and read Data ' File_url = ' Https://raw.githubusercontent.com/jcjohnson/torch-rnn/master/data/tin
Y-shakespeare.txt ' file_name = ' tinyshakespeare.txt '
If not OS. path. Exists (file_name): Urllib. Request Urlretrieve (File_url, F
Ilename=file_name) with open (file_name, ' R ') as F:
Raw_data = F.read () Print (data length, Len (raw_data)) |
Processing character data, converting numbers using set to go heavy, getting all the unique characters then a character corresponds to a number (using the dictionary) then traversing the original data, getting all the characters corresponding to the number
1
2
3
4
5
6 7 8
|
' process character data, convert to Number ' '
Vocab = Set (raw_data) # Use Set to go heavy, this is to remove duplicate letters (case is differentiated)
Vocab_size = Len (vocab) Idx_to_vocab = Dict (Enumerate (vocab)) # This turns set into a dictionary, each character corresponds to a number 0,1,2,3 ... (vocab_size-1) Vocab_to_idx = dict (Zip (idx_to_voc
Ab.values (), Idx_to_vocab.keys ()) # This converts the dictionary (key, value) to (value, key) data = [Vocab_to_idx[c] for C in Raw_data] # handles the raw_data, according to the character, gets the corresponding value, Is the number del raw_data |
Generate batch data TensorFlow models given PTB model: HTTPS://GITHUB.COM/TENSORFLOW/MODELS/TREE/MASTER/TUTORIALS/RNN/PTB
1
2
3
4
5
6 7 8 9 10
|
"' num_steps=200 ' the
number of steps to learn
batch_size=32 the size of state_size=100 # cell
num_classes = Vocab_size
learning_rate = 1e-4
def gen_epochs (Num_epochs, Num_steps, batch_size): For
I in range (num_ epochs):
yield reader.ptb_iterator_oldversion (data, batch_size, num_steps)
|
Ptb_iterator function Implementation: Returns the data X,y shape=[batch_size, Num_steps]
1
2 3
4
5
6 7
8
9 10
11
12 13
14
15
16 17
18
19 20
21st
22
23
24
25
26 27
28
29 30
31
32 |
def ptb_iterator_oldversion (Raw_data, Batch_size, num_steps):
"" "Iterate on the raw PTB data. This is generates batch_size pointers into The raw PTB data, and allows minib
Atch iteration along these pointers. Args:raw_data:one of the Raw D
ATA outputs from Ptb_raw_data.
Batch_size:int, the batch size. Num_steps:int, the number of UNRolls. Yields:pairs of the batched data,
Each a matrix of shape [Batch_size, Num_steps].
The second element of the tuple is the same data time-shifted to the
Right by one.
Raises:
Valueerror:if batch_size or num_steps are too high.
""" |