This period of time is of interest to LDA, trying to use it at work. The quick verification of the idea is usually done using "gibbslda++-0.2", a C implementation version of LDA. These two days with C + + STL wrote a stand-alone version of LDA, the original intention is as follows:
1. "gibbslda++-0.2", although known as the most popular LDA toolkit, still has a noticeable bug, referring to "LDA" correcting the two memory problems in gibbslda++-0.2.
2. "gibbslda++-0.2" basically uses the pure C write, the variable name uses the mathematical symbol, but is not the very intuitive descriptive type symbol, is not easy to understand. Even at work, after training the results, I always have to control its documentation to reflect what each outcome file corresponds to.
3. "gibbslda++-0.2" puts the vocabulary extraction and the training of the model itself together. This is also possible for small-scale training, to scan through the training set to take the title out, and then continue training in memory. But for a slightly larger scale, it is silly to take the word list every time you train, and in many problems, the training set may not necessarily cover all the words in the glossary ... Anyway, I split them up. To have a preprocessing process (I didn't write) first extract the word list from the training set and then enter it into the model with the training sample and participate in the training.
4. There are too many unrelated codes for "gibbslda++-0.2", such as parsing the command line code, and so on. In fact, I prefer to use the source code directly in this way than the command line.
5. The most important reason, in fact, I was itchy hands.
Put the code on git: https://github.com/henryxiao1997/LDACplus/
Finish.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
"Lda" hands-on implementation of LDA