Reprinted from Wentingtu
Topic model deformation based on LDA in recent years, with the emergence and development of LDA, a group of topic model cattle have sprung up. My main focus is on the following Daniel and his students:
Founder of David M. Bleilda, PhD graduated in 04. A doctoral dissertation on topic model fully embodies its profound mathematical probabilistic skills, and its own implementation of LDA can also reflect its good programming ability. It is useless to say that people have papers for proof:
- J. Chang and D. Blei. relational Topic Models for Document Networks. Artificial Intelligence and Statistics, 2009. [PDF]
The basic LDA model, of course, assumes that the documents are interchangeable, so the original LDA Chinese document is actually considered conditional. In the actual situation, it is often not the case, the document may exist between the "social network," the nature of such networks. How to combine the two characteristics of content and "social network" may be a very interesting topic. This paper gives a solution. It adds a random variable of $ two between two documents to characterize this implicit link relationship based on its content characteristics.
The link relation of the display is the past this year, people chase research object, and then produce PageRank, hits and so on a large number of excellent link relation algorithm. So how do you take advantage of implicit links? What is an implied link? One of the simplest implicit links is a graph based on content similarity. This is used by people, such as Lexrank in the digest. O Kurland in the Sigir two articles are probably similar articles, the essence of thought seems to be the use of content between the "hyperlink."
Another novel research point is how to excavate content features based on "Social network"? Mei Qiaozhu's thesis is to revise the original pLSA model by using the most regular factor of network structure characteristic of "social net". The idea is very novel.
- D. Blei and J. Lafferty. Topic Models. In A. Srivastava and M. Sahami, editors, Text mining:theory and Applications. Taylor and Francis, in press. [PDF]
This paper is a review of the large production of the paper, Blei in the in-depth introduction of what is the topic model and some of his early topic model deformation. It's worth reading for everyone.
- J. Boyd-graber and D. Blei. syntactic Topic Models. Neural Information Processing Systems, 2009. [PDF] [Supplement]
The original LDA study of two words was based on a co-existing angle. In fact, this kind of co-occurrence is often not able to accurately depict some sentence structure information or word sense information. How to bring this information into. Considering a deeper generation model is currently a hotspot. This paper focuses on the generative process of syntactic analysis of a sentence, which argues that each sentence is generated based on the "parse tree", and that the entire probability generation process is completely attached to the "parse tree". And in each sentence, different words are likely to choose a more suitable topic.
- D. Blei, J. McAuliffe. supervised topic models. In advances in neural information processing Systems 21, 2007. [PDF] [Digg Data]
Nowadays, in addition to pure content, network data often has other auxiliary information, such as user's evaluation of a post or user's evaluation of a product. One of the most typical examples is that when you buy a book, you can rate the quality of the book: 5 stars represent the best, 4 stars are better, ... In turn. So how do you add this information to the original LDA? Blei introduces a response variable factor, which is dependent on the topic distribution of the document.
How to combine ratable information and content organically is also a recent research hotspot. Most methods also create a ratable response variable, and then the variable condition depends on the content or the topic information.
- J. Boyd-graber, D. Blei, and X. Zhu. A topic Model for Word sense disambiguation. In empirical Methods in Natural Language processing, 2007. [PDF]
This paper corresponds to a large background is to apply the topic model to natural language processing, the specific content I did not look too, mainly combined with the WordNet structure features, on the basis of the resulting graph model.
In addition, some of the work also uses the topic Model for abstracts and part-of-speech tagging. Two main ideas applied to these questions: the first is to use the topic model to learn some compact features, and then on a sub-basis using the classifier and other machine learning methods, the other is to take advantage of the original NLP problem of some structural information, For example, the network structure in WordNet, which derives the probability generation process of the whole graph model in this structure feature.
- D. Blei and J. Lafferty. A Correlated topic model of Science. Annals of Applied Statistics. 1:1 17–35. [PDF] [Shorter version from NIPS 18] [Code] [Browser]
Has not seriously looked, this actually broke the original topic between the exchange of.
- D. Blei and J. Lafferty. Dynamic topic models. In Proceedings of the 23rd International Conference on machine Learning, 2006. [PDF]
Did not look closely, the topic model and the time dimension of the combined together. Mei Qiaozhu also has a paper that studies the topic over time, but is based on Plsi and Hmm.
- T. Griffiths, M. Steyvers, D. Blei, and J. Tenenbaum. Integrating topics and syntax. In advances in neural information processing Systems 17, 2005. [PDF]
This paper is a very good paper, the beginning of a detailed description of the different functional classification of the word, also known as the Hmm-lda model. Just as everyone has its social meaning, there are different roles in the expression of words in the text semantics. The author divides the words into two main functions: the first is the semantic function, that is, all of our topic word, and the other is the grammatical function, that is to say, the existence of these words is to make the whole sentence generation process look more like a complete body or more in line with the language specification. T. Griffiths and M. Steyvers is two excellent academics who have developed the topic Model Toolkit and also have a pile of cow papers.
- D. Blei. probabilistic Models of Text and Images. PhD thesis, u.c Berkeley, Division of Computer Science, 2004. [PDF]
Blei's doctoral dissertation, I have not yet read, because has been tangled in that varitional inference derivation. Blame yourself for yourself.
- D. Blei, A. Ng, and M. Jordan. latent Dirichlet allocation. Journal of machine learning, 3:993–1022, January 2003. [A shorter version appeared in NIPS 2002]. [PDF] [Code]
The first article of LDA is not very good at reading. In the first reading, the general will encounter the exchangeable, variational inference, simplex and so on details. Classics in the classic.
- D. Blei and P. Moreno. Topic segmentation with an aspect hidden Markov model. In Proceedings of the 24th Annual International ACM Sigir Conference on the "and Development in information retrieval, Pages 343–348. ACM Press, 2001. [PDF]
An article on Sigir in the paper. In fact, the segment of this thing in the reality of the demand is larger, but the mature toolkit is not much, or I do not know. A better mature algorithm is usually calculated based on the variation of the semantic slope. In the second call to understand this aspect of the Daniel recommended a few useful tools. With the segmentation is closely related to a problem is the page body extraction, the same problem, the number of paper, but the actual release code is very few. More famous, such as VIPs, but I have not used it. Yesterday found VIPs's author was also a giant bull of the Chinese, Deng Cai. Before the Tsinghua students, now the division from Jiawei Han, a variety of cattle conferences and cattle journal issued more than n articles. Worship here for a moment.
Summary of the current I can read topic model article or a small part of their own probability and mathematical basis is too poor, for posterior inference often powerless, this is the next step in my goal. And in fact, they are not very innovative, the next step is to do more in this area, and strive to apply topic model to solve their own practical problems.
"Turn" based on LDA's topic model deformation