This article modeling series value three: Lda sentiment

Source: Internet
Author: User

Lda:latent Dirichlet Allocation is a well-known text model that was first proposed by a group of Daniel in 2003, including David M.blei, Andrew y.ng, etc. Compared to the previous pLSA text model, LDA is a Bayesian view of the pLSA, the so-called Bayesian point of view, is what is uncertain, unlike pLSA in the P (z|d), although it is a hidden variable, but still a definite value, but for the Bayesian school of thought, its probability is uncertain, The probability conforms to a certain distribution, and in LDA it is subjected to Dirichlet distribution. In the "1" paper, the author says that pLSA is not a well-defined generation model (not quite understood).

For LDA, because I'm not a topic model, I want to use a theme model to compress the dimensions of a document's eigenvectors to produce a document vector of text categorization. Because the Personal Mathematics Foundation is not very solid, in addition as a scientific research ability General slag Shuo, understand LDA does have difficulty, these days see "Lda math Gossip" and some knowledge of Gibbs sampling, originally want to thoroughly understand, helpless see for a few days, can only see a outline, This article does not intend to elaborate the mathematical principles of LDA, you can refer to the original paper and the following "2" "3" several reference documents.

For LDA, because each great God's blog has been very detailed, July the great God specifically wrote Lda, written in detail, for July blog in the middle of the mathematical deduction, I really do not love, but at the end of the author of a sentence is to give me a great inspiration: "Lda is actually the Bayesian point of view of the pLSA." For LDA refine, this is actually the truth. so this post is mainly about my experience with some of LDA's ideas, not the specific process.

As we all know, in this field, there are two schools, frequency schools and Bayesian factions. The frequency faction thinks that the probabilities of all things are deterministic (even unknown). But for the Bayes, the main point of view is that everything is uncertain and there is a distribution of everything. For LDA, it is believed that the subject distribution of a document is uncertain, its distribution conforms to a distribution, which is called Dirichlet distribution, and it also thinks that the distribution of words under a subject is uncertain, and its distribution is in accordance with Dirichlet distribution. Knowing the two points and understanding the two points, the LDA model structure is basically understood. Look at the following figure:



Is the classic model of LDA, in plain words, the process of generating an article of LDA is:



Is the LDA model introduced in LDA's mathematical gossip, as is the case with the LDA model's document generation process.

Compared to pLSA, it simply adds a Dirichlet distribution to P (z|d) and P (w|z), but the result is much more powerful than the PLSA model, and of course its mathematical complexity has grown to more than one level.

LDA, although the derivation process is complex, but the results are very elegant, this is the magic of LDA, with the simplest conclusion to defeat you. This is also the beauty of mathematics (although I am not very good at math).

"1" Latent Dirichlet allocation.david m.blei, Andrew y.ng

"2" Lda math gossip, Zhihuihui

"3" Gibbs sampling for the uninitiated

This article modeling series value three: Lda sentiment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.