LDA overall process First, define the meaning of some letters:
- Document Set D and topic set T
- In D, each document D is considered as a word sequence <W1, W2 ,..., Wn>, wi indicates the I-th word, and D has n words. (LDA is calledWord bagIn fact, the location where each word appears isAlgorithmNo effect)
- All the different words involved in D constitute a big collection Vocabulary (VOC)
LDA uses the document set D as the input (there will be cut words, deprecated words, take dry words and other common preprocessing, skip the table ), two result vectors to be trained (set to k topics, VOC contains m words ):
- for each document D in D, probability corresponding to different topics θ D
t1 ,..., P TK > , P Ti indicates the probability that D corresponds to the I topic in T. The calculation method is intuitive. P Ti =n Ti /n, where n Ti indicates the number of words corresponding to the I topic in D, n is the total number of all words in D.
- Generate the probability of different words for the topic t in each t.Phi t<PW1,..., PWm>, Where, PWiThe probability that t generates the I-th word in VOC. The calculation method is equally intuitive, PWi= NWi/N, where nWiIndicates the number of I words in VOC corresponding to topic T, and N indicates the total number of words corresponding to topic T.
The core formula of LDA is as follows:
P (w | D) = P (w | T) * P (t | D)
Intuitively, this formula uses topic as the intermediate layer. The probability that word w appears in document D can be given through the current θ D and φt. P (t | D) is calculated using θ D, and P (w | T) is calculated by using θ T. In fact, using the current θ D and θ T, we can calculate the P (w | D) of a word corresponding to any topic in a document ), then, update the topic corresponding to the word based on these results. Then, if this update changes the topic corresponding to the word, it will in turn affect θ D and Phi T.
LDA Learning Process when the LDA algorithm starts, assign values to θ D and θ t randomly (to all D and T ). Then the above process repeats constantly, and the final result of convergence is the output of lda. Let's take a closer look at this iterative learning process: 1) for the I word WI in a specific document ds, if the topic corresponding to this word is TJ, you can rewrite the above formula to: PJ (WI | DS) = P (WI | TJ) * P (TJ | DS) no matter how the value is calculated (it can be understood that the corresponding items are obtained directly from θ DS and Phi TJ. It is not that simple, but it has no impact on understanding the entire LDA process ). 2) Now we can enumerate the topics in t to obtain all PJ (WI | DS). The J value is 1 ~ K. Then, you can select a topic for the I-th word WI in DS based on these probability values. The simplest idea is to take the largest TJ (note that only J is a variable in this formula) of PJ (WI | DS ), that is, argmax [J] PJ (WI | DS) of course, this is only a method (it seems that it is not very common ), in fact, there are many ways to choose T in the academic field. I have not studied it well. 3) then, if the I-th word WI in DS selects a topic that is different from the original one, it will have an impact on θ D and φt (it is easy to know according to the calculation formula of the two vectors mentioned above ). In turn, their impact affects the calculation of P (w | D) mentioned above. Perform a P (w | D) Calculation on all W in D and reselect the topic as an iteration. In this way, the results required by LDA will be converged after N iterations.