International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Overall LDA Process

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

LDA overall process First, define the meaning of some letters:

Document Set D and topic set T
In D, each document D is considered as a word sequence <W1, W2 ,..., Wn>, wi indicates the I-th word, and D has n words. (LDA is calledWord bagIn fact, the location where each word appears isAlgorithmNo effect)
All the different words involved in D constitute a big collection Vocabulary (VOC)

LDA uses the document set D as the input (there will be cut words, deprecated words, take dry words and other common preprocessing, skip the table ), two result vectors to be trained (set to k topics, VOC contains m words ):

for each document D in D, probability corresponding to different topics θ D
t1 ,..., P TK > , P Ti indicates the probability that D corresponds to the I topic in T. The calculation method is intuitive. P Ti =n Ti /n, where n Ti indicates the number of words corresponding to the I topic in D, n is the total number of all words in D.
Generate the probability of different words for the topic t in each t.Phi t<PW1,..., PWm>, Where, PWiThe probability that t generates the I-th word in VOC. The calculation method is equally intuitive, PWi= NWi/N, where nWiIndicates the number of I words in VOC corresponding to topic T, and N indicates the total number of words corresponding to topic T.

The core formula of LDA is as follows:

P (w | D) = P (w | T) * P (t | D)

Intuitively, this formula uses topic as the intermediate layer. The probability that word w appears in document D can be given through the current θ D and φt. P (t | D) is calculated using θ D, and P (w | T) is calculated by using θ T. In fact, using the current θ D and θ T, we can calculate the P (w | D) of a word corresponding to any topic in a document ), then, update the topic corresponding to the word based on these results. Then, if this update changes the topic corresponding to the word, it will in turn affect θ D and Phi T.

LDA Learning Process when the LDA algorithm starts, assign values to θ D and θ t randomly (to all D and T ). Then the above process repeats constantly, and the final result of convergence is the output of lda. Let's take a closer look at this iterative learning process: 1) for the I word WI in a specific document ds, if the topic corresponding to this word is TJ, you can rewrite the above formula to: PJ (WI | DS) = P (WI | TJ) * P (TJ | DS) no matter how the value is calculated (it can be understood that the corresponding items are obtained directly from θ DS and Phi TJ. It is not that simple, but it has no impact on understanding the entire LDA process ). 2) Now we can enumerate the topics in t to obtain all PJ (WI | DS). The J value is 1 ~ K. Then, you can select a topic for the I-th word WI in DS based on these probability values. The simplest idea is to take the largest TJ (note that only J is a variable in this formula) of PJ (WI | DS ), that is, argmax [J] PJ (WI | DS) of course, this is only a method (it seems that it is not very common ), in fact, there are many ways to choose T in the academic field. I have not studied it well. 3) then, if the I-th word WI in DS selects a topic that is different from the original one, it will have an impact on θ D and φt (it is easy to know according to the calculation formula of the two vectors mentioned above ). In turn, their impact affects the calculation of P (w | D) mentioned above. Perform a P (w | D) Calculation on all W in D and reselect the topic as an iteration. In this way, the results required by LDA will be converged after N iterations.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

sexy overall lda sigma lda cas no overall readiness result fail one of hipaa s five overall objectives is daemon process hash process

OpenGL Series Tutorial Eight: OpenGL vertex buffer Object (VBO) 07-26

Methods for generating various waveform files Vcd,vpd,shm,fsdb 02-11

Mac Ping:sendto:Host is down Ping does not pass other people'... 09-01

Solution to the problem that WordPress cannot be opened after... 12-05

(SOLR is successfully installed on the office machine accordi... 12-07

Webmaster resources (site creation required) 12-07

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Overall LDA Process

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support