Recurrent Neural Network Language Modeling Toolkit source (eight)

Source: Internet
Author: User
Tags sprintf

Series PrefaceReference documents:
  1. Rnnlm-recurrent Neural Network Language Modeling Toolkit (click here to read)
  2. Recurrent neural network based language model (click here to read)
  3. EXTENSIONS of recurrent neural NETWORK LANGUAGE MODEL (click here to read)
  4. Strategies for Training Large scale neural Network Language Models (click here to read)
  5. statistical LANGUAGE MODELS BASED on neural NETWORKS (click here to read)
  6. A Guide to recurrent neural Networks and backpropagation (click here to read)
  7. A Neural Probabilistic Language Model (click here to read)
  8. Learning long-term Dependencies with Gradient descent is difficult (click here to read)
  9. Can Artificial Neural Networks learn Language Models? (click here to read)

since Testnbest (), TestGen () I didn't see, there are two trunk functions, one is the training function and the other is the test function, both of which call the function described earlier. training, each time the file training is completed, will immediately after the training model in the valid file interview, see how the effect, if the effect of the training is also good, then continue the same learning rate to training documents, if the effect is not too many dozen promotion, will reduce the learning rate to half, continue to learn , until there is not much promotion, no more training. As to how this effect is viewed, it refers to the confusion of training the model on valid. The test function is to directly train the model on the test file to calculate all the logarithm probability and conversion into PPL, there is a concept of dynamic model, meaning that while testing, but also to update the parameters of the network, so that the test file can also update the model parameters. A very important part of the calculation is the PPL, the following formula is the PPL formula, in order to paste up and the Program Code Section control:


This is the formula for calculating the perplexity of a sequence w1w2w3...wk, followed by the C value in the program takes 10, which is later seen in the code. The following is a direct code and comments:
Training network void Crnnlm::trainnet () {int A, B, Word, last_word, WORDCN;    Char log_name[200]; FILE *fi, *flog;//in time.h typedef long clock_t clock_t Start, now;//log_name the string is Rnnlm_file.output.txt sprintf (log_n    Ame, "%s.output.txt", rnnlm_file);    printf ("Starting training using File%s\n", train_file);    Starting_alpha=alpha;    Opens Rnnlm_file file Fi=fopen (rnnlm_file, "RB"); if (fi!=null) {//open successfully, there is a trained file model fclose (FI);p rintf ("Restoring network from file to continue training...\n");//Will Rnnlm_ File in the Model information Recovery restorenet ();} else {//rnnlm_file Open failed//Read data from Train_file, relevant data will be loaded into vocab,vocab_hashlearnvocabfromtrainfile ();//Allocate memory, initialize network initnet ()    //iter indicates the number of training sessions for the entire training document ITER=0;}    if (class_size>vocab_size) {printf ("Warning:number of classes exceeds vocabulary size!\n");    }//counter Meaning: The currently trained word is Train_file's counter word counter=train_cur_pos; Savenet ();//outermost loop, loop over the entire training file to complete a training, with ITER indication while (ITER < Maxiter) {printf ("ITER:%3d\talpha:%f\t", I ter, Alpha);//fflush (stdoUT) refreshes the standard output buffer and prints the output buffer to the standard output device//the content to be output is immediately output fflush (stdout);        Initialize bptt_history, history if (bptt>0) for (a=0; a<bptt+bptt_block; a++) bptt_history[a]=0;        for (a=0; a<max_ngram_order; a++) history[a]=0; TRAINING phase//clears the ac,er value of the neuron netflush ();//Open the Training file Fi=fopen (Train_file, "RB");//in vocab subscript 0 means that the end of a sentence is </s&        Gt        Last_word=0; Todo if (counter>0) for (a=0; a<counter; a++) Word=readwordindex (FI);//this'll skip words that were Alrea                Dy learned if the training was interrupted//record every time the corpus starts training start=clock (); while (1) {counter++;//The following information outputs if ((counter%10000) ==0) if ((debug_mode>1)) {Now=clock () for each training of 10,000 words////train_ Words represents the number of words in the training file if (train_words>0)//output of the first%c, followed by 13 for the return return of the ASCII, note different from the line-break key 10//to entropy I do not understand, so not quite understand train entropy specific meaning// Progress represents the position of the currently trained word in the entire training file, i.e. the training progress//WORDS/SEC indicates how many wordprintf are trained per second ("%citer:%3d\talpha:%f\t TRAIN entropy:%.4f P Rogress:%.2f%% words/sec:%.1f ", 1000000.0, ITER, Alpha,-LOGP/LOG10 (2)/counter, counter/(real) train_words*100, counter/((double) (Now-start)/ ); elseprintf ("%citer:%3d\talpha:%f\t TRAIN entropy:%.4f Progress:%dk", 2, ITER, Alpha,-LOGP/LOG10)/counter, counter/1000); fflush (stdout);} Indicates that each training Anti_k Word will save network information to Rnnlm_fileif ((anti_k>0) && ((counter%anti_k) ==0) {train_cur_pos=counter;/ /Save All information on the network to Rnnlm_filesavenet ();}     Reading the next word, the function returns the next word in the vocab subscript Word=readwordindex (FI);      Read Next word//Note When the first word in the training file, i.e. Counter=1, Last_word represents the end of a sentence computenet (Last_word, Word);        Compute probability distribution if (feof (FI)) break;            End of File:test on validation data, iterate till CONVERGENCE//LOGP represents the cumulative logarithmic probability, i.e. Logp = log10w1 + log10w2 + log10w3 ... if (word!=-1) logp+=log10 (NEU2[VOCAB[WORD].CLASS_INDEX+VOCAB_SIZE].AC * neu2[word].ac);//The first condition is not understood, the second condition seems to be isinf (x) C 99 newly added mathematical functions, if x Infinity returns a macro value other than 0//The value is wrong if ((LOGP!=LOGP) | | (Isinf (LOGP))) {printf ("\nnumerical error%d%f%f\n ", Word, neu2[word].ac, neu2[vocab[word].class_index+vocab_size].ac); exit (1);} if (bptt>0) {//shift memory needed for BPTT to next time step////move here, the result is bptt_history from subscript 0 is stored wt,wt-1,wt-2...for ( A=bptt+bptt_block-1; a>0; a--) bptt_history[a]=bptt_history[a-1];bptt_history[0]=last_word;//here to move, the result is Bptt_hidden from subscript 0 is stored st,st-1,st-2 ... for (a=bptt+bptt_block-1; a>0; a--) for (b=0; b<layer1_size; b++) {Bptt_hidden[a*layer1_size+b].ac=bptt_            hidden[(A-1) *layer1_size+b].ac;bptt_hidden[a*layer1_size+b].er=bptt_hidden[(A-1) *layer1_size+b].er;}            }//reverse learning, adjusting parameters learnnet (Last_word, Word); Copy the AC value of the hidden layer neuron to the layer1_size part of the output layer, i.e. s (t-1) copyhiddenlayertoinput ();//prepare to encode the input layer where the next word is located if (Last_word  !=-1) neu0[last_word].ac=0;            Delete previous activation Last_word=word; Move, the result is the history from subscript 0 is stored wt, WT-1,WT-2 ... for (a=max_ngram_order-1; a>0; a--) History[a]=history[a-1]; History[0]=last_word;//word==0 indicates the end of the current sentence, independent not 0, that is, to require each sentence independent training//This control surface will be a sentence independent training, if independent==0,        A sentence on the surface of the training of the next sentence as a historical information//This control also depends on the relationship between the sentence and sentence how the IF (Independent && (word==0)) Netreset ();  }//Close file (train_file) fclose (FI), Now=clock ();//Output the entire file training complete information, specifically see above printf ("%citer:%3d\talpha:%f\t train entropy: %.4f words/sec:%.1f ", 1000000.0, ITER, Alpha,-LOGP/LOG10 (2)/counter, counter/((double) (Now-start)/));//Training files will only be Again, and then save if (one_iter==1) {//no validation data is needed and the network is all saved with modified weightsprintf ("\ n"); log            p=0;////Save all information from the network to Rnnlm_filesavenet ();        break;} VALIDATION phase//above the training again, the following to verify, using early-stopping//note here and above train PHASE different is, the following content is just to do the calculation, calculate probability distribution// and test the probability of the entire validation file, there will be no learnet part, if there is a dynamic models//to clear the ac,er value of the neuron netflush ();//Open the Validation data file Fi=fopen (Vali D_file, "RB"), if (fi==null) {printf ("Valid File not found\n"); exit (1);} AB Open file: b Indicates binary mode//a indicates that if the file does not exist, the file will be established, if the file exists, the data written will be added to the end of the file//loThe G_name string is Rnnlm_file.output.txtflog=fopen (log_name, "AB"), if (flog==null) {printf ("Cannot open log file\n"); exit (1);}        fprintf (Flog, "Index P (NET) word\n");                fprintf (Flog, "----------------------------------\ n");        Last_word=0;        The meaning of LOGP=0;//WORDCN is the same as counter, but WORDCN does not include oov word wordcn=0;     while (1) {//reads the next word, the function returns the next word in the vocab subscript Word=readwordindex (FI);                  Calculates the probability distribution of the next word computenet (Last_word, Word);        if (feof (FI)) break; End of File:report Logp, PPL if (word!=-1) {//logp denotes cumulative logarithmic probability, i.e. Logp = log10w1 + log10w2 + log10w3...logp+=log10            (NEU2[VOCAB[WORD].CLASS_INDEX+VOCAB_SIZE].AC * neu2[word].ac); wordcn++;}            /*if (word!=-1) fprintf (Flog, "%d\t%f\t%s\n", Word, neu2[word].ac, vocab[word].word);    elsefprintf (Flog, " -1\t0\t\toov\n"); *///learnnet (Last_word, Word); This will is in the implemented for dynamic models////copy the AC value of the hidden layer neuron to the output layer layer1_sIze that part, S (t-1) copyhiddenlayertoinput ();////prepare to encode the input layer where the next word is located if (last_word!=-1) neu0[last_word].ac=  0;            Delete previous activation Last_word=word;            Move, the result is that the history from subscript 0 is stored wt, WT-1,WT-2 ... for (a=max_ngram_order-1; a>0; a--) history[a]=history[a-1]; History[0]=last_word;//word==0 indicates the end of the current sentence, independent not 0, that is, to require each sentence independent training//This control surface will be a sentence independent training, if independent==0,        The IF (Independent && (word==0)) Netreset () that a sentence on the surface is trained in the next sentence as historical information;        } fclose (FI);        Train_file fprintf (Flog, "\niter:%d\n", ITER) for the first ITER training; fprintf (flog, "valid log probability:%f\n", LOGP);//I really don't understand it here. EXP10 () Where does this function come from, what does the function mean? I'm not sure, I want to know the friend told ~// But according to the PPL definition to deduce, it is not difficult to find exp10 what meaning, see the PPL formula, the formula we take a constant c = 10 can//So EXP10 (x) is the meaning of 10^ (x) fprintf (flog, "PPL Net:%f\n", EXP10 (-lo                gp/(real) wordcn));                Fclose (flog);//entropy is not familiar with it. printf ("VALID Entropy:%.4f\n",-LOGP/LOG10 (2)/WORDCN); Counter=0;train_curThe _POS=0;//LLOGP in front of the previous last//indicates that if the result of this training is not last good, then revert to last//otherwise save the current network if (LOGP&LT;LLOGP) restoreweights        (); else Saveweights ();//logp is the greater the better the training//initial min_improvement=1.003,alpha_divide=0//here indicates if the effect of this training is less significant (increase Min_improve ment times) The effect of entering the cycle//training is relatively significant, do not enter the loop, the Alpha remains the same//here can refer to the original paper on page 30th there is a more detailed description if (LOGP*MIN_IMPROVEMENT&LT;LLOGP) {//If not significantly improved,            Open Alpha_divide Control if (alpha_divide==0) alpha_divide=1;                else {//If there is no significant improvement and the alpha_divide switch is open, then exit training, then the training is good savenet ();            Break        }//If there is no significant improvement, the learning rate will be reduced by half if (alpha_divide) alpha/=2;        Llogp=logp;        Logp=0;        iter++;    Savenet ();    }}//test network void crnnlm::testnet () {int A, B, Word, last_word, WORDCN;    FILE *fi, *flog, *lmprob=null;    Real Prob_other, Log_other, Log_combine;    Double D;    Restore the model information in Rnnlm_file to Restorenet (); Use_lmprob This control switch equals 1 o'clock, indicating the use of other well-trained language models if (Use_lmprob) {//Open other language model files Lmprob=fopen (Lmprob_file, "RB ");    }//test PHASE//netflush ();//Open Test file Fi=fopen (Test_file, "RB");    sprintf (str, "%s.%s.output.txt", Rnnlm_file, Test_file); Flog=fopen (str, "WB");//stdout is a file pointer, C has been defined in the header file, can be used directly, assign it to another file pointer, so directly to the standard output//    printf is actually fprintf's first parameter is set to stdout flog=stdout; if (debug_mode>1) {if (Use_lmprob) {fprintf (flog, "Index P (NET) p (LM) word\n"), fprintf (Flog, "----- ---------------------------------------------\ n ");}    else {fprintf (flog, "Index P (NET) word\n") fprintf (flog, "----------------------------------\ n");} }//in Vocab 0 means that the end of a sentence is </s&gt, that is, the Last_word initial, that is equal to the end of sentence last_word=0;//rnn the logarithmic cumulative probability of the test file logp=0;//other language model pairs Logarithmic cumulative probability of a test file Log_other=0;//rnn combined with other language models logarithmic cumulative probability log_combine=0;//the probability of a word in other language models prob_other=0;//wordcn the meaning of the Co in trainnet    Unter, except that WORDCN does not include the Oov word wordcn=0;//copy the AC value of the hidden layer neuron to the output layer layer1_size that part, that is, S (t-1) copyhiddenlayertoinput (); Clears the historical information if (bptt>0) for (a=0; a<bptt+bptt_block; a++) bptt_history[a]=0;    for (a=0; a<max_ngram_order; a++) history[a]=0;        if (independent) netreset (); while (1) {//reads the next word, the function returns the next word in the vocab subscript Word=readwordindex (FI);//calculates the probability distribution of the next word computenet (Last_word        , word); if (feof (FI)) Break;//end of File:report Logp, PPL if (use_lmprob) {fscanf (Lmprob, "%lf", &amp            ;d);p Rob_other=d;        Gotodelimiter (' \ n ', lmprob); }//log_combine by Factor Lambda interpolation if ((word!=-1) | | (prob_other>0))  {if (word==-1) {///Here is no reason to punish logp+=-8;//some ad hoc penalty-when mixing different vocabularies, single model score was not Real ppl//interpolation log_combine+=log10 (0 * lambda + prob_other* (1-LAMBDA));} else {//calculate RNN Cumulative logarithmic probability logp+=log10 (NEU2[VOCAB[WORD].CLASS_INDEX+VOCAB_SIZE].AC * neu2[word].ac);//Interpolation log_combine+= LOG10 (NEU2[VOCAB[WORD].CLASS_INDEX+VOCAB_SIZE].AC * neu2[word].ac*lambda + prob_other* (1-LAMBDA));}            LOG_OTHER+=LOG10 (Prob_other);        wordcn++; }if (debug_mode>1) {if (Use_lmprob) {if(word!=-1) fprintf (Flog, "%d\t%.10f\t%.10f\t%s", Word, neu2[vocab[word].class_index+vocab_size].ac *neu2[word].ac, Prob_other, Vocab[word].word); else fprintf (flog, " -1\t0\t\t0\t\toov");} else {if (word!=-1) fprintf (Flog, "%d\t%.10f\t%s", Word, neu2[vocab[word].class_index+vocab_size].ac *neu2[word].ac, Vocab[word].word); Else fprintf (flog, " -1\t0\t\toov");} fprintf (flog, "\ n");} This is part of the dynamic model that allows RNN to learn to update the parameters if (dynamic>0) {if (bptt>0) {//bptt_history move backward one position, and the most recent                Word loads the first position of bptt_history for (a=bptt+bptt_block-1; a>0; a--) bptt_history[a]=bptt_history[a-1]; bptt_history[0]=last_word;//moves the Bptt_hidden back one position, leaves the first position, and the first position is assigned in learnnet for (A=bptt+bptt_blo Ck-1; a>0; a--) for (b=0; b<layer1_size; b++) {bptt_hidden[a*layer1_size+b].ac=bptt_hidden[(A-1) *layer1_size+b                    ].ac;            bptt_hidden[a*layer1_size+b].er=bptt_hidden[(A-1) *layer1_size+b].er;} }            //dynamic model when learning rate alpha=dynamic;learnnet (Last_word, Word);        The dynamic update}//replicates the AC value of the hidden layer neuron to the layer1_size part of the output layer, i.e. s (t-1) copyhiddenlayertoinput ();  Prepare to encode if (last_word!=-1) neu0[last_word].ac=0 the input layer where the next word is located;        Delete previous activation Last_word=word;        Move the history of the Me section back one position, the first place the nearest word for (a=max_ngram_order-1; a>0; a--) history[a]=history[a-1];    history[0]=last_word;//this and the previous empathy if (independent && (word==0)) Netreset ();    } fclose (FI); if (use_lmprob) fclose (lmprob);//Output The information about the test file//write to the log file if (debug_mode>0) {fprintf (flog, "\ntest log pr Obability:%f\n ", Logp), if (Use_lmprob) {fprintf (flog," test log probability given by other LM:%f\n ", Log_other); fprintf ( Flog, "test log probability%f*rnn +%f*other_lm:%f\n", Lambda, 1-lambda, log_combine);} fprintf (Flog, "\NPPL Net:%f\n", EXP10 (-logp/(real) WORDCN)), if (Use_lmprob) {fprintf (flog, "PPL Other:%f\n", EXP10 (-log _other/(real) WORDCN); fprintf (Flog, "PPL Combine:%f\n ", EXP10 (-log_combine/(real) WORDCN));} } fclose (flog);}

Well, RNNLM Toolkit source of the temporary end, the content will certainly have many of their own understanding of the wrong place, or the same welcome to understand the friend pointed out, together, because the plot is too scattered in each article, and finally I will be in the RNNLM Toolkit's internal data structure diagram is posted as a single article.

Recurrent Neural Network Language Modeling Toolkit source (eight)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.