Topicmodel Topic Model-LDA's flaws and improvements

Source: Internet
Author: User

http://blog.csdn.net/pipisorry/article/details/45307369

LDA limitations:what ' s next?

Although LDA is a great algorithm for topic-modelling, it still have some limitations, mainly due to the fact that it's has Become popular and available to the mass recently.

One major limitation is perhaps given by its underlying unigram text model : LDA doesn ' t consider themutual position of the words in the document. Documents like ' Man, I love this can ' and ' I can love this man ' is probably modelled the same. It's also true to longer documents, mismatching topics is harder. To overcome this limitation for the cost of almost square the complexity, you can use 2-grams (or n-grams) along with 1-gra M.

Another weakness of LDA is in the topics composition : they ' re overlapping. In fact, you can find the same word in multiple topics (the example above, of the word "can", is obvious). The generated topics, therefore, is not independent andorthogonal (orthogonal) like in a pca-decomposed basis, for example. This implies, must pay lots of attention while dealing with them (e.g. don ' t Usecosine similarity).

For a more structured approach-especially if the topic composition are very misleading -you might consider the< C1>hierarchical Variation of Lda, named H-lda, (or simply hierarchical LDA). In H-lda, topics is joined together in a hierarchy by using a Nested Chinese Restaurant Process (NCRP). This model was more complex than LDA, and the description is beyond the goal of this blog entry, but if you like Idea of the possible output, where it is. Don ' t forget that we ' re still in theprobabilistic world:each node of the H-dla tree is a topic distribution.

[http://engineering.intenthq.com/2015/02/automatic-topic-modelling-with-lda/]

from:http://blog.csdn.net/pipisorry/article/details/45307369

Ref


Topicmodel Topic Model-LDA's flaws and improvements

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.