Topic model Deformation Based on LDA

Source: Internet
Author: User
Http://blog.csdn.net/hexinuaa/article/details/6021069
Over the past few years, along with the emergence and development of LDA, a group of talented people have emerged to create topic models. I mainly pay attention to the following Daniel and his students:
Founder of David M. bleilda, Ph.D. in 04 years. A doctoral thesis on topic model fully embodies its profound mathematical probability knowledge, and its own LDA can reflect its excellent programming capabilities. The argument is that people are useless and have papers as evidence:
  • J. Chang and D. blei.Relational topic models for document Networks.Artificial Intelligence and statistics, 2009. [PDF]

The basic LDA model, of course, assumes that documents are interchangeable, then in the original Lda, documents are considered to be conditional independent. In actual situations, this is often not the case. Documents may be stored in the network of social network. How to combine the content and the "Social Network" features may be a very interesting topic. This paper provides a solution. It adds a binary random variable between two documents and depicts this implicit link relationship based on its content features.

The displayed link relationships are the objects of research that people have pursued in the past year, and a large number of excellent link relationship algorithms, such as PageRank and hits, are generated. How can we use hidden links? What is an implicit link? A simple hidden link is a graph built based on content similarity. This is a pleasure to be used, such as lexrank in the Digest. [Size =-1] O Kurland published two similar articles in SIGIR. The essence of his mind seems to be using "hyperlinks" between content ".
Another novel research point is how to mine Content features based on the social network? One of Mei qiaozhu's papers is to use the network structure features of the "Social Network" to normalize the original plsa model. The idea is very novel.

  • D. blei and J. Lafferty.Topic models.In A. Srivastava and M. sahami, editors,Text Mining: Theory and Applications. Taylor and Francis, in press. [PDF]

This paper is a comprehensive big production paper. blei briefly introduces the topic model and its early topic model deformation. It is worth reading.

  • J. Boyd-Graber and D. blei.Syntactic topic Models.Neural information processing systems, 2009. [PDF] [Supplement]

The original LDA only examines the two words from the perspective of co-occurrence. In actual situations, such co-occurrence is often unable to accurately portray some sentence structure information or word meaning information. How to introduce this information. Considering the deeper model generation is currently a hot topic. This paper focuses on the process of generating a sentence's syntax analysis. It believes that the generation of each sentence is based on the "parse tree", and the entire probability generation process is completely attached to the "parse tree. In addition, different words in each sentence may choose topics that are more suitable for you.

  • D. blei, J. McAuliffe.Supervised topic Models. In advances in neural information processing systems 21,200 7. [PDF] [Digg Data]

Nowadays, in addition to pure content, network data often has other auxiliary information, such as users' comments on a blog or users' comments on a product. The most typical example is that after Dangdang buys a book, you can score the quality of the book: 5 stars represent the best, and 4 stars represent the better ,... And so on. So how can we add this information to the original lda? Blei introduces a response variable factor, which depends on the topic distribution of this document.

How to organically combine ratable Information and content is also a recent research hotspot. In most cases, a ratable response variable is created, and the variable condition depends on the content or topic information.

  • J. Boyd-Graber, D. blei, and X. Zhu.A topic model for Word Sense Disambiguation. In empirical methods in natural language processing, 2007. [PDF]

A major background of this paper is to apply the topic model to natural language processing. I am not very familiar with the specific content. It mainly integrates the structural features of WordNet and the graph model generated on this basis.
In addition, topic models are used for summarization and part-of-speech tagging. Two main ideas applied to these problems: the first is to use the topic model to learn compact features and then use classifier and other machine learning methods; the other is to use some structure information of the original NLP problem, such as the network structure in WordNet, to deduce the probability generation process of the entire graph model in this structural feature.

  • D. blei and J. Lafferty.A correlated topic modelScience. Annals of Applied Statistics. 1:1 17-35. [PDF] [shorter version from nips 18] [Code] [browser]

I have not taken it seriously. This actually breaks the possibility of exchange between original topics.

  • D. blei and J. Lafferty.Dynamic topic Models. In Proceedings of the 23rd International Conference on machine learning, 2006. [PDF]

The topic model and time dimension are combined. Mei qiaozhu also has a paper on topics that have changed over time, but it is based on plsi and hmm.

  • T. Griffin iths, M. steyvers, D. blei, and J. Tenenbaum.Integrating topics and syntax. In advances in neural information processing systems 17,200 5. [PDF]

This paper is a very good paper, the beginning of a detailed description of the different functional classification of words, also called the HMM-LDA model. Just as everyone has its own social significance, word existence also has different roles for the expression of text semantics. The author divides words into two major functions: the first is the semantic function, that is, all our previous Topic words; the other is the syntax function, that is to say, these words exist to make the entire sentence generation process look more like a complete body or more in line with language specifications. T. Griffin iths and M. steyvers are two excellent scholars who developed the topic model toolkit and have a bunch of papers.

  • D. blei.Probabilistic models of text and images. PhD thesis, U. C. Berkeley, Division of computer science, 2004. [PDF]

Blei's doctoral thesis has not been read yet, because it has been entangled in the derivation of the varitional inference. Blame yourself.

  • D. blei, A. NG, and M. Jordan.Latent Dirichlet allocation. Journal of machine learning research, 3: 993-1022, January 2003. [A shorter version appeared in NIPS 2002]. [PDF] [Code]

The first article of LDA is not very easy to understand. During the first reading, you may encounter problems such as interchangeable, variational inference, and simplex. Classic.

  • D. blei and P. Moreno.Topic segmentation with an aspect Hidden Markov Model. In Proceedings of the 24th Annual International acm sigir Conference on research and development in information retrieval, pages 343-348. ACM Press, 2001. [PDF]

A Segmentation paper in SIGIR. In fact, the demand for segmentation is large in reality, but there are not many mature toolkit, or I don't know. Mature algorithms are generally calculated based on changes in the semantic slope. Several useful tools are recommended by Daniel who understands this in the next call. One issue that is closely related to segments is the extraction of the webpage body, which is also a problem. Many papers are published, but few actually release code. Well-known, such as VIPs, but I have never used it. Yesterday, we found that the author of VIPs was also a Chinese giant, Deng CAI. I used to be a Tsinghua student, and now I have published more than N Articles from Jiawei Han, various ox conferences and journals. Worship here.

To sum up, there are still a few articles on the topic model that I can understand. My probability and mathematical foundation are too poor, and I can't do anything about posterior inference. This is my goal in the next step. In addition, you are not very innovative. The next step is to work harder in this aspect to apply the topic model to solve your own problems.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.