Series PrefaceReference documents:
- Rnnlm-recurrent Neural Network Language Modeling Toolkit (click here to read)
- Recurrent neural network based language model (click here to read)
- EXTENSIONS of recurrent neural NETWORK LANGUAGE MODEL (click here to read)
- Strategies for Training Large scale neural Network Language Models (click here to read)
- statistical LANGUAGE MODELS BASED on neural NETWORKS (click here to read)
- A Guide to recurrent neural Networks and backpropagation (click here to read)
- A Neural Probabilistic Language Model (click here to read)
- Learning long-term Dependencies with Gradient descent is difficult (click here to read)
- Can Artificial Neural Networks learn Language Models? (click here to read)
Previous write to Learnvocabfromtrainfile (), continue below, the following function of a number of successive functions are to save the network part of the information, or restore information, first put the previous diagram here for comparison
Save the current weights, as well as the neuron information values, the data structure of the network is shown in Figure void Crnnlm::saveweights () {int a,b;//the transient input layer neuron value for (a=0; a<layer0_size; a++) { Neu0b[a].ac=neu0[a].ac; Neu0b[a].er=neu0[a].er; }//temporary hidden layer neuron value for (a=0; a<layer1_size; a++) {neu1b[a].ac=neu1[a].ac; Neu1b[a].er=neu1[a].er; }//Temporary compression layer neuron value for (a=0; a<layerc_size; a++) {neucb[a].ac=neuc[a].ac; Neucb[a].er=neuc[a].er; }//Temporary output layer neuron value for (a=0; a<layer2_size; a++) {neu2b[a].ac=neu2[a].ac; Neu2b[a].er=neu2[a].er; }//Temporary input layer to hidden layer weights for (b=0; b<layer1_size; b++) for (a=0; a<layer0_size; a++) {//The ownership value entered into the hidden layer here can be understood as a layer_size*l The ayer0_size matrix, which only uses a one-dimensional array to load//and the corresponding Parameter[b][a] map to the subscript of a one-dimensional array is the same as a + b*layer0_size//for other layer-to-layer weight storage syn0b[a+b*layer0_ Size].weight=syn0[a+b*layer0_size].weight; }//If there is a compression layer if (layerc_size>0) {//temporary hidden layer to the compression layer's weight for (b=0; b<layerc_size; b++) for (a=0; a<layer1_size; a++) {sy N1b[a+b*layer1_size].weight=syn1[a+b*layer1_size].weight;}//the weight of the compressed layer to the output layer for (b=0; b<layer2_size; b++) for (a=0; a<layerc_size; a++) {Syncb[a+b*layerc_size].weight=sync[a +b*layerc_size].weight;} } else {//If there is no compression layer//Direct deposit to the output layer of the right value for (b=0; b<layer2_size; b++) for (a=0; a<layer1_size; a++) {Syn1b[a+b*layer1_siz E].weight=syn1[a+b*layer1_size].weight;} }//Because it is commented out, there is no direct connection parameter for the input layer to the output layer//for (a=0; a<direct_size; a++) Syn_db[a].weight=syn_d[a].weight;} Above is the temporary current weights and neuron values, here is the data from the previous recovery//meaning are similar, do not make a specific comment void crnnlm::restoreweights () {int A, a, b; for (a=0; a<layer0_size; a++) {neu0[a].ac=neu0b[a].ac; Neu0[a].er=neu0b[a].er; } for (a=0; a<layer1_size; a++) {neu1[a].ac=neu1b[a].ac; Neu1[a].er=neu1b[a].er; } for (a=0; a<layerc_size; a++) {neuc[a].ac=neucb[a].ac; Neuc[a].er=neucb[a].er; } for (a=0; a<layer2_size; a++) {neu2[a].ac=neu2b[a].ac; Neu2[a].er=neu2b[a].er; } for (b=0, b<layer1_size; b++) for (a=0; a<layer0_size; a++) {syn0[a+b*layer0_size].weight=syn0b[a+b*layer0_size].weight; } if (layerc_size>0) {for (b=0, b<layerc_size; b++) for (a=0; a<layer1_size; a++) {SYN1[A+B*LAYER1_SIZE].W Eight=syn1b[a+b*layer1_size].weight;} For (b=0, b<layer2_size; b++) for (a=0; a<layerc_size; a++) {sync[a+b*layerc_size].weight=syncb[a+b*layerc_size] . Weight;} } else {for (b=0; b<layer2_size; b++) for (a=0; a<layer1_size; a++) {Syn1[a+b*layer1_size].weight=syn1b[a+b*layer 1_size].weight;} }//for (a=0; a<direct_size; a++) Syn_d[a].weight=syn_db[a].weight;} Save the hidden layer neuron's AC value void Crnnlm::savecontext ()//useful for n-best list processing{int A; for (a=0; a<layer1_size; a++) Neu1b[a].ac=neu1[a].ac;} Restores the AC value of the hidden layer neuron void crnnlm::restorecontext () {int A; for (a=0; a<layer1_size; a++) Neu1[a].ac=neu1b[a].ac;} Save the hidden layer neuron's AC value void Crnnlm::savecontext2 () {int A; for (a=0; a<layer1_size; a++) Neu1b2[a].ac=neu1[a].ac;} Restoring the AC value of the hidden layer neurons void CRNnlm::restorecontext2 () {int A; for (a=0; a<layer1_size; a++) Neu1[a].ac=neu1b2[a].ac;}
As for why the compression layer is established, see Paper extensions of recurrent neural NETWORK LANGUAGE MODEL, which says that the compression layer is to reduce the output to the hidden layer of the parameters, and reduce the total computational complexity, As to why the increase of the compression layer can reduce the computational capacity, I do not quite understand, if the understanding of the friend see also please inform ha.
The following function is the initialization of the network, a bit more content, and some of the content needs to be illustrated to be more clear, so the function is divided into two parts write. The following is the function of the inside of the previous content, mainly to complete the allocation of memory, initialization, and so on, the process is equivalent to the network to build up, the reference graph can be more clear.
void Crnnlm::initnet () {int A, b, CL; Layer0_size=vocab_size+layer1_size;//layer1_size Initial Layer2_size=vocab_size+class_size;//class_size initial time is 100// Calloc are initialized memory applications//Establish input layer, hidden layer, compression layer, output layer neu0= (struct neuron *) calloc (layer0_size, sizeof (struct neuron)); neu1= (struct neuron *) calloc (layer1_size, sizeof (struct neuron)); neuc= (struct neuron *) calloc (layerc_size, sizeof (struct neuron)); neu2= (struct neuron *) calloc (layer2_size, sizeof (struct neuron));//Establish the weight parameter of the hidden layer to the input layer syn0= (struct synapse *) calloc (Layer0 _size*layer1_size, sizeof (struct synapse));//If there is no compression layer if (layerc_size==0)//Set the compression layer to the hidden layer of the weight parameter syn1= (struct synapse *) Calloc (layer1_size*layer2_size, sizeof (struct synapse)); else {//contains a compression layer//establishes a compression layer to the hidden layer of the weight parameter syn1= (struct synapse *) calloc (layer1_size*layerc_size, sizeof (struct synapse));// The weight parameter sync= (struct synapse *) of the output layer to the compression layer is established calloc (layerc_size*layer2_size, sizeof (struct synapse)); } if (Syn1==null) {printf ("Memory allocation failed\n"); exit (1); } if (layerc_size>0) if (sync==null) {printf ("Memory allocation failed\n"); exit (1); }//Establish input layer to output layer parameters, Direct_size is a long long type, specified by the-direct parameter, the unit is million//For example-direct is passed in 2, then the real direct_size = 2*10^6 syn_d= (direc T_T *) calloc (long Long) direct_size, sizeof (direct_t)); if (syn_d==null) {printf ("Memory Allocation for direct connections failed (requested%LLD bytes) \ n", (Long Long) direct_siz E * (Long Long) sizeof (direct_t)); exit (1); }//Create a neuron backup space neu0b= (struct neuron *) calloc (layer0_size, sizeof (struct neuron)); neu1b= (struct neuron *) calloc (layer1_size, sizeof (struct neuron)); neucb= (struct neuron *) calloc (layerc_size, sizeof (struct neuron)); neu1b2= (struct neuron *) calloc (layer1_size, sizeof (struct neuron)); neu2b= (struct neuron *) calloc (layer2_size, sizeof (struct neuron));//Create a backup space for the synapse (that is, the weight parameter) syn0b= (struct synapse *) calloc ( layer0_size*layer1_size, sizeof (struct synapse)); syn1b= (struct synapse *) calloc (layer1_size*layer2_size, sizeof (struct synapse)); if (layerc_size==0) syn1b=(struct synapse *) calloc (layer1_size*layer2_size, sizeof (struct synapse)); else {syn1b= (struct synapse *) calloc (layer1_size*layerc_size, sizeof (struct synapse)), syncb= (struct synapse *) calloc ( layerc_size*layer2_size, sizeof (struct synapse)); } if (Syn1b==null) {printf ("Memory allocation failed\n"); exit (1); }//Below Initializes all neurons with a value of 0 for (a=0; a<layer0_size; a++) {neu0[a].ac=0; neu0[a].er=0; } for (a=0; a<layer1_size; a++) {neu1[a].ac=0; neu1[a].er=0; } for (a=0; a<layerc_size; a++) {neuc[a].ac=0; neuc[a].er=0; } for (a=0; a<layer2_size; a++) {neu2[a].ac=0; neu2[a].er=0; }//all the ownership value parameters are initialized to random numbers, in the range [ -0.3, 0.3] for (b=0; b<layer1_size; b++) for (a=0; a<layer0_size; a++) {syn0[a+b* Layer0_size].weight=random ( -0.1, 0.1) +random ( -0.1, 0.1) +random (-0.1, 0.1); } if (layerc_size>0) {for (b=0, b<layerc_size; b++) for (a=0; a<layer1_size; a++) {Syn1[a+b*layer1_size].weight=random ( -0.1, 0.1) +random ( -0.1, 0.1) +random (-0.1, 0.1);} for (b=0; b<layer2_size; b++) for (a=0; a<layerc_size; a++) {sync[a+b*layerc_size].weight=random (-0.1, 0.1) + Random ( -0.1, 0.1) +random (-0.1, 0.1);} } else {for (b=0; b<layer2_size; b++) for (a=0; a<layer1_size; a++) {syn1[a+b*layer1_size].weight=random (-0.1, 0). 1) +random ( -0.1, 0.1) +random (-0.1, 0.1);} }//input to output direct-attached parameters initialized to 0 long long AA; for (aa=0; aa<direct_size; aa++) syn_d[aa]=0; if (bptt>0) {bptt_history= (int *) calloc ((bptt+bptt_block+10), sizeof (int)); for (a=0; a<bptt+bptt_block; a++) bptt_history[a]=-1;//bptt_hidden= (Neuron *) calloc ((bptt+bptt_block+1) *layer1_size, sizeof (neuron)); for (a=0; a< (Bptt+bptt_block) *layer1_size; a++) {bptt_hidden[a].ac=0;bptt_hidden[a].er=0;} bptt_syn0= (struct synapse *) calloc (layer0_size*layer1_size, sizeof (struct synapse)), if (bptt_syn0==null) {printf (" Memory allocation failed\n "); exit (1);} }//saveweights does not save the input layer to the output layer parameters, that is, Syn_d savEweights ();
The second part is the classification of the word, note that the following vocab is from the big to the small row of good order, the following is the word classification, classification is based on their one-dollar frequency, the final result is the closer to the previous category of Word, they appear relatively high frequency, The closer to the next category is the more word it contains, the more sparse they appear in the corpus.The following code constructs a structure such as:
Double DF, DD; int i; df=0; Dd=0; a=0; b=0;//Note here vocab is from big to small row good order//The following is the word classification, classification is based on their one Yuan word//classification of the end result is that the closer to the previous category is very small, they appear a high frequency// The closer to the next category is the more word it contains, the more sparse if (old_classes) {//Old classes for (i=0; i<vocab_size; i++) of the corpus they appear in the corpora b+=vocab[i] . cn; for (i=0; i<vocab_size; i++) {df+=vocab[i].cn/(double) b;if (df>1) df=1;if (df> (a+1)/(double) class_size) { Vocab[i].class_index=a;if (a<class_size-1) a++;} else {vocab[i].class_index=a;} }} else {//New classes for (i=0; i<vocab_size; i++) b+=vocab[i].cn; for (i=0; i<vocab_size; i++) dd+=sqrt (vocab[i].cn/(double) b); for (i=0; i<vocab_size; i++) {df+=sqrt (vocab[i].cn/(double) b)/dd; if (df>1) df=1; if (df> (a+1)/(double) class_size) {vocab[i].class_index=a;if (a<class_size-1) a++; } else {vocab[i].class_index=a; }}}//allocate auxiliary class variables (for faster search when normalizing probability At output layer)//the following is to speed up the search, the ultimate goal is to be given a category, can quickly traverse the category of all word, the structure is shown in Figure class_words= (int * *) calloc (class_size, sizeof (int *)); class_cn= (int *) calloc (class_size, sizeof (int)); class_max_cn= (int *) calloc (class_size, sizeof (int)); for (i=0; i<class_size; i++) {class_cn[i]=0;class_max_cn[i]=10;class_words[i]= (int *) calloc (Class_max_cn[i], sizeof (int)); } for (i=0; i<vocab_size; i++) {cl=vocab[i].class_index;class_words[cl][class_cn[cl]]=i;class_cn[cl]++;if (class _CN[CL]+2>=CLASS_MAX_CN[CL]) {class_max_cn[cl]+=10;class_words[cl]= (int *) realloc (CLASS_WORDS[CL], class_max_ cn[cl]*sizeof (int));} }}
The above functions initialize the network to involve the maximum entropy model, that is, can be easily understood as the input layer to the output layer of the direct connection, although the author in the paper is always stressed that can be said, but I think it is not so simple direct connection, there will be a historical array, which will be discussed later. The next several functions are easy to understand, directly on the comment, in order to save space, put in the next article
Recurrent Neural Network Language Modeling Toolkit source analysis (iv)