## Topic model reading list

### LDA

Latent Dirichlet Allocation*Proposed LDA as a hierarical Bayesian model for text corpus. Developed variational EM algorithm. |

Finding scientific topics Developed collapsed Gibbs sampling (CGS), the most widely used algorithm for LDA. |

Parameter estimation for text analysis*A friendly guide on the technical details for LDA, with very detailed derivation of CGS. |

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation VB counterpart of CGS, converges quickly. |

On smoothing and inference for topic models*Compared ML, MAP, VB, CGS estimations of LDA, and pointed out they have similar results with carefully choose of hyper-parameters. Proposed CVB0, a simplification of CVB. |

Rethinking LDA: Why Priors MatterInvestigated the effect of prior, advocated using asymmetric prior for theta and symmetric prior for phi. |

Evaluation Methods for Topic ModelsSummarized and proposed several methods for computing the marginal likelihood (evidence) for LDA, useful for model selection. |

### Other topic models

Correlated topic modelsModels topic correlation, i.e., how likely two topics appear in a same document. |

Dynamic topic modelsModels topic dynamic, i.e., how topics evolve over time. |

Supervised topic Models Jointly train an LDA and a logistic regression which uses the topic feature. |

MedLDA: Maximum Margin Supervised Topic Models for Regression and ClassificationJointly train an LDA and an SVM which uses the topic feature. |

Replicated softmax: an undirected topic model A neural topic model which models the word distribution by a multinomial RBM. |

### Stochastic/online methods

Online EM Algorithm for Latent Data Models (Cappe and Moulines) Proposed using Robbins-Monro stochastic approximation for MLE estimate. |

Stochastic Variational InferenceInspired by (Cappe and Moulines), developed Robbins-Monro based stochastic approximation for variational Bayesian estimate. |

Sparse stochastic inference for latent Dirichlet allocation Scaled SVI up for large number of topics, by local Gibbs sampling. |

Stochastic gradient Langevin dynamics The first paper on stochastic gradient methods for MCMC. Generalize LD to stochastic case by removing the MH test, converge to stationary distribution when the step size approaches to zero. Smooth transition between optimizing and sampling. Stochastic gradient based. |

Stochastic gradient Riemannian Langevin dynamics on the probability simplex Used RLD and proposed various reparametrizations for sampling on the probability simplex. |

Streaming variational Bayes A framework to do truly streaming inference. |

### Fast sampling algorithms

Efficient methods for topic model inference on streaming document collectionsProposed SparseLDA, O(K_w+K_d). |

Reducing the sampling complexity of topic modelsProposed AliasLDA, O(K_d) (with MH test). |

Lightlda: Big topic models on modest computer clustersProposed LightLDA, O(number of MH steps). |

A scalable asynchronous distributed algorithm for topic modeling Proposed F+ LDA, O(K_d + log K). |

### Distributed systems

Scalable inference in latent variable models Proposed Yahoo!LDA and developed a parameter server for LDA. |

Lightlda: Big topic models on modest computer clusters Proposed a model parallel approach. |

Scaling distributed machine learning with the parameter server Modern parameter server. |

Bidmach: Large-scale learning with zero memory allocation State-of-the-art GPU acceleration for LDA. |

SAME but Different: Fast and High-Quality Gibbs Parameter Estimation A Gibbs sampler optimized for GPU. |

### Model specific algorithms

Scalable Inference for Logistic-Normal Topic ModelsData augmentation for CTM. |

Bayesian Logistic Supervised Topic Models with Data Augmentation Data augmentation for SLDA. |

Linear Time Samplers for Supervised Topic Models using Compositional ProposalsCombined LightLDA and data augmentation for MedLDA. |

Scaling up Dynamic Topic ModelsSGRLD for DTM. |