"Lda" hands-on implementation of LDA

Source: Internet
Author: User

This period of time is of interest to LDA, trying to use it at work. The quick verification of the idea is usually done using "gibbslda++-0.2", a C implementation version of LDA. These two days with C + + STL wrote a stand-alone version of LDA, the original intention is as follows:

1. "gibbslda++-0.2", although known as the most popular LDA toolkit, still has a noticeable bug, referring to "LDA" correcting the two memory problems in gibbslda++-0.2.

2. "gibbslda++-0.2" basically uses the pure C write, the variable name uses the mathematical symbol, but is not the very intuitive descriptive type symbol, is not easy to understand. Even at work, after training the results, I always have to control its documentation to reflect what each outcome file corresponds to.

3. "gibbslda++-0.2" puts the vocabulary extraction and the training of the model itself together. This is also possible for small-scale training, to scan through the training set to take the title out, and then continue training in memory. But for a slightly larger scale, it is silly to take the word list every time you train, and in many problems, the training set may not necessarily cover all the words in the glossary ... Anyway, I split them up. To have a preprocessing process (I didn't write) first extract the word list from the training set and then enter it into the model with the training sample and participate in the training.

4. There are too many unrelated codes for "gibbslda++-0.2", such as parsing the command line code, and so on. In fact, I prefer to use the source code directly in this way than the command line.

5. The most important reason, in fact, I was itchy hands.


Put the code on git: https://github.com/henryxiao1997/LDACplus/


Finish.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

"Lda" hands-on implementation of LDA

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.