Topic modeling [classic model]

Source: Internet
Author: User

Http://www.cs.princeton.edu /~ Blei/topicmodeling.html

Topic models are a suite of algorithms that uncover the hidden thematic structure in document collections. These algorithms help us develop new ways to search, browse and summarize large archives of texts.

Below, you will find links to introductory materials, corpus browsers Based on topic models, and open source software (from my research group) for topic modeling.

Introductory Materials
  • I wrote a general introduction to topic modeling.
  • John Lafferty and I wrote a more technical review paper about this field.
  • Here are slides from some recent tutorials about topic Modeling:
    • KDD-2011
    • ICML-2012
    • Machine Learning Summer School (2012)
  • Here is a video from a talk on dynamic and correlated topic models applied to the JournalScience. (Here are the slides .)
  • David Mimno maintains a bibliography of topic modeling papers and software.
  • The topic models mailing list is a good forum for discussing topic modeling.
Corpus browsers Based on topic Models

The structure uncovered by topic models can be used to launch e an otherwise unorganized collection. The following are browsers of large collections of documents, built with topic models.

  • A 100-topic browser of the dynamic topic model fitScience(1882-2001 ).
  • A 100-topic browser of the correlated topic model fitScience(1980-2000)
  • A 50-topic browser of latent Dirichlet allocation fit to the 2006 arXiv.
  • A 20-topic browser of latent Dirichlet allocation fitThe American Political Science Review

Also see Sean gerrish's discipline browser for an interesting application of topic modeling at JSTOR.

To build your own browsers, see Allison Chaney's excellent topic model visualization engine (tmve). For example, here is a browser of 100,000 Wikipedia articles that uses tmve.

Topic modeling software

Our research group has released into open-source software packages for Topic modeling. Please post questions, comments, and suggestions about this code to the topic models mailing list.

Link Model/Algorithm Language Author Notes
Lda-C Latent Dirichlet allocation C D. blei This implements variational inference for lda.
Class-slda Supervised topic models for classifiation C ++ C. Wang Implements supervised topic models with a categorical response.
LDA R package for James sampling in modeling Models R J. Chang ImplementsBytesModels and isFast. Supports Lda, RTMS (for networked clients), mmsb (for network data), and slda (with a continuous response ).
Online LDA Online inference for LDA Python M. Hoffman Fits topic models to massive data. The demo downloads random Wikipedia articles and fits a topic model to them.
Online HDP Online inference for the HDP Python C. Wang Fits hierarchical Dirichlet process topic models to massive data. The algorithm determines the number of topics.
Tmve (online) Topic model visualization Engine Python A. Chaney A package for creating corpus browsers. See, for example, Wikipedia.
CTR Collaborative Modeling for recommendation C ++ C. Wang Implements variational inference for a collaborative topic models. These models recommend items to users based on item content and other users 'ratings.
DTM Dynamic topic models and the Influence Model C ++ S. gerrish This implements topics that change over time and a model of how individual statements Statements predict that change.
HDP Hierarchical Dirichlet Processes C ++ C. Wang Topic models where the data determine the number of topics. This implements James sampling.
CTM-C Correlated topic Models C D. blei This implements variational inference for the CTM.
Diln Discrete infinite logistic normal C J. Paisley This implements the discrete infinite logistic normal, a Bayesian nonparametric topic model that finds correlated topics.
Hlda Hierarchical latent Dirichlet allocation C D. blei This implements a topic model that finds a hierarchy of topics. The structure of the hierarchy is determined by the data.
Turbotopics Turbo topics Python D. blei Turbo topics find significant multiword phrases in topics.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.