Topic model reading list


Latent Dirichlet Allocation*
Proposed LDA as a hierarical Bayesian model for text corpus. Developed variational EM algorithm.
Finding scientific topics
Developed collapsed Gibbs sampling (CGS), the most widely used algorithm for LDA.
Parameter estimation for text analysis*
A friendly guide on the technical details for LDA, with very detailed derivation of CGS.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation
VB counterpart of CGS, converges quickly.
On smoothing and inference for topic models*
Compared ML, MAP, VB, CGS estimations of LDA, and pointed out they have similar results with carefully choose of hyper-parameters. Proposed CVB0, a simplification of CVB.
Rethinking LDA: Why Priors Matter
Investigated the effect of prior, advocated using asymmetric prior for theta and symmetric prior for phi.
Evaluation Methods for Topic Models
Summarized and proposed several methods for computing the marginal likelihood (evidence) for LDA, useful for model selection.

Other topic models

Correlated topic models
Models topic correlation, i.e., how likely two topics appear in a same document.
Dynamic topic models
Models topic dynamic, i.e., how topics evolve over time.
Supervised topic Models
Jointly train an LDA and a logistic regression which uses the topic feature.
MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification
Jointly train an LDA and an SVM which uses the topic feature.
Replicated softmax: an undirected topic model
A neural topic model which models the word distribution by a multinomial RBM.

Stochastic/online methods

Online EM Algorithm for Latent Data Models (Cappe and Moulines)
Proposed using Robbins-Monro stochastic approximation for MLE estimate.
Stochastic Variational Inference
Inspired by (Cappe and Moulines), developed Robbins-Monro based stochastic approximation for variational Bayesian estimate.
Sparse stochastic inference for latent Dirichlet allocation
Scaled SVI up for large number of topics, by local Gibbs sampling.
Stochastic gradient Langevin dynamics
The first paper on stochastic gradient methods for MCMC. Generalize LD to stochastic case by removing the MH test, converge to stationary distribution when the step size approaches to zero. Smooth transition between optimizing and sampling. Stochastic gradient based.
Stochastic gradient Riemannian Langevin dynamics on the probability simplex
Used RLD and proposed various reparametrizations for sampling on the probability simplex.
Streaming variational Bayes
A framework to do truly streaming inference.

Fast sampling algorithms

Efficient methods for topic model inference on streaming document collections
Proposed SparseLDA, O(K_w+K_d).
Reducing the sampling complexity of topic models
Proposed AliasLDA, O(K_d) (with MH test).
Lightlda: Big topic models on modest computer clusters
Proposed LightLDA, O(number of MH steps).
A scalable asynchronous distributed algorithm for topic modeling
Proposed F+ LDA, O(K_d + log K).

Distributed systems

Scalable inference in latent variable models
Proposed Yahoo!LDA and developed a parameter server for LDA.
Lightlda: Big topic models on modest computer clusters
Proposed a model parallel approach.
Scaling distributed machine learning with the parameter server
Modern parameter server.
Bidmach: Large-scale learning with zero memory allocation
State-of-the-art GPU acceleration for LDA.
SAME but Different: Fast and High-Quality Gibbs Parameter Estimation
A Gibbs sampler optimized for GPU.

Model specific algorithms

Scalable Inference for Logistic-Normal Topic Models
Data augmentation for CTM.
Bayesian Logistic Supervised Topic Models with Data Augmentation
Data augmentation for SLDA.
Linear Time Samplers for Supervised Topic Models using Compositional Proposals
Combined LightLDA and data augmentation for MedLDA.
Scaling up Dynamic Topic Models