Recurrent neural Network Language Modeling Toolkit tool use Click to open link
Follow the training schedule to learn the code:
Structure in Trainnet ():
Step1.
learnvocabfromtrainfile () Statistics all the word information in the training file, and organize the statistic good information
The data structures involved:
Vocab_word
Ocab_hash *int
The functions involved:
Addwordtovocab ()
For a word w, the information is stored in an array of vocab_word structure, its structure is labeled WP, and then take the hash code of W (Getwordhash ), so that the value of the hash code in the Vocab_ The subscript WP in the word structure. [Vocab_word is then sort, the subscript of the word w may change , this will be reflected in the Searchvocab]
Searchvocab ()
Finds and returns the subscript of the word w in Vocab_word. Take its hash code, find in Vocab_hash, if not check out then return-1, otherwise find the subscript, take out the vocab in the subscript corresponding word with the word w to compare, if the same, then return the subscript, otherwise in vocab to find the contrast, the The index found is added and returned in Vocab_has, or 1 if it is still not found.
Sortvocab ()
Reorder the Vocab_word based on the number of occurrences of the word in the training set.