Recently, using lad to calculate the similarity between words, has been used Gibbslda (c + + version) on the Linux server, and later found that there is a Java version, and then switched to the Java version. Here is the first introduction to the use of methods, and then say the use of the problem encountered in the process.
(1) Import Project
First download the Gibbslda project from http://sourceforge.net/projects/jgibblda/and unzip it in the Eclipse workspace. Click on import under File
Select an existing project and click Next.
Click Browse, Pop-up dialog box, select just the LDA Project decompression Place, click "OK", and then finish. So the project's import is complete.
(2) running LDA
Prepare the data before running the lad and save it in lad input format. Number of first behavior documents, each of the following acts a document, with space spacing between word words. If you are dealing with Chinese documents, you have to do word segmentation beforehand.
Then you need to configure the parameters, click "Run"---> "Run Configurations"
At argument, enter "-est-alpha 0.2-beta 0.1-ntopics 10-niters 1000-savestep 100-twords 5-dir" new . txt "", the specific meaning of the argument see official Description: http://jgibblda.sourceforge.net/. Click Apply---->run.
The following diagram shows the success of the operation
(3) problems at run time
Finally, I'll talk about the problems I've encountered throughout the operation.
1, the parameter configuration did not write "dir models\casestudy", only write "-dfile Models\casestudy\new.txt will report to find the file, if the absolute path to New.txt, although no error, But I don't know where his output is kept. So this "dir models\casestudy" is very useful, and the output is stored directly under the folder.
2, Chinese garbled. Because the previous use of Java code Generation New.txt, there is no coding, the default is ANSI, resulting in the output of LDA is garbled.
Solution: Save As UTF-8