The weekend was a little experiment at home with LDA for two days. Among the many implementations of LDA's toolkit, Gibbslda is the most widely used, including the C + + version, the Java version, and so on. Gibbslda++ is the implementation of its C + + version, which is currently available in version 0.2. During the actual use, the implementation version was found to have memory usage issues. I took some time to locate the problem and put it up for everyone's reference.
Issue 1: Array memory access is out of bounds
In Model.cpp, two matrices NW and ND are used to store word-topic relationship and document-topic relationship respectively. The sizes of the two matrices are v * k and M * k respectively, where V is the thesaurus size, M is the number of documents, and K is the number of topic. In the process of sampling, random number generator is used to randomly produce topic corresponding index. The source program is as follows:
int topic = (int) ((double) random ()/rand_max) * K);
In principle, the value of the topic index is [0,k-1], however, the above line of procedure, the function of random () can be Rand_max, that is, the above statement produces the topic index range is [0,k], when the resulting index is K, in the next operation, An array of out-of-bounds access occurred.
Therefore, the above code should be modified to:
int topic = (int) ((double) random ()/(rand_max+1)) * K);
I am actually using Windows above, Windows does not support the random () function, so change to the rand () function, as follows:
int topic = (int) ((double) rand ()/(rand_max+1)) * K);
Of course, srandom () should also be changed to Srand ().
Issue 2: Memory leaks
Memory leaks occur primarily in the class model's destructor, which is Model::~model (). The reason is simple, the author of the vector memory release, using the delete, and the correct should be used delete [].
For example, the original code:
if (NW) {for (int w = 0; w < V; w++) { if (Nw[w]) {delete nw[w];}} }
As mentioned earlier, NW is a matrix. The correct code is:
if (NW) {for (int w = 0; w < V; w++) { if (Nw[w]) {delete [] nw[w]; //!!! }}} delete [] NW; //!!!
After modifying the above two problems, gibbslda++-0.2 is running smoothly on the machine. --in fact, no correction can run out of the results: for memory access, not many times, so the impact is not significant; for memory leaks, the OS automatically cleans up the memory space used by the process when it exits, so it doesn't have much impact. This is probably why this toolkit has been used by so many people, mostly researchers, and no one is correcting the problem.
Finish.
Reprint Please specify source: http://blog.csdn.net/xceman1997/article/details/46405597
"LDA" fixed two memory issues in gibbslda++-0.2