Adji Bousso Dieng

CV  /  Google Scholar  /  LinkedIn  /  Github  /  Twitter  /  Email: abd2141 at columbia dot edu

I am a Ph.D candidate in the department of Statistics at Columbia University where I am jointly being advised by David Blei and John Paisley. In my research, I work on combining probabilistic graphical modeling and deep learning to design models for structured high-dimensional data such as text. I also work on variational methods as an inference framework for fitting these models. My work is funded by a Columbia Dean Fellowship and a Google PhD Fellowship in Machine Learning.


Prior to joining Columbia I worked as a Junior Professional Associate at the World Bank. I did my undergraduate training in France where I attended Lycee Henri IV and Telecom ParisTech--France's Grandes Ecoles system. I hold a Diplome d'Ingenieur from Telecom ParisTech and spent the third year of Telecom ParisTech's curriculum at Cornell University where I earned a Master in Statistics.

Selected Invited Talks

My goal as a Machine Learning researcher is twofold. My first goal is to combine deep learning and probabilistic graphical modeling to design models that are expressive and powerful enough to capture meaningful representations of high-dimensional structured data. My second goal is to develop efficient, scalable, and generic algorithms for learning with these models. Achieving these two goals will benefit many applications.

Prescribed Generative Adversarial Networks
Adji B. Dieng, Francisco R. J. Ruiz, David M. Blei, Michalis Titsias
Journal of Machine Learning Research (JMLR) (Submitted)
arxiv / Code /

This paper describes a solution to two important problems in the GAN literature: (1) How can we maximize the entropy of the generator of a GAN to prevent mode collapse? (2) How can we evaluate predictive log-likelihood for GANs to assess how they generalize to new data? Key ingredients: noise, entropy regularization, and Hamiltonian Monte Carlo.

Reweighted Expectation Maximization
Adji B. Dieng, John Paisley
Under submission at Journal of Machine Learning Research (JMLR)
arxiv / Code /

Maximum likelihood in deep generative models is hard. The typical workaround is variational inference (VI) which maximizes a lower bound to the log marginal likelihood of the data. VI introduces an undesirable amortization gap and often causes latent variable collapse. We propose to use expectation maximization (EM) instead. Importantly, we separate posterior inference and model fitting. To fit the model we leverage moment matching to learn rich proposals to estimate the EM objective. Posterior inference is done after the model is fitted. This two-step procedure shies away from the current VAE approach of bundling together model fitting and posterior inference. Turns out EM learns better deep generative models than VI as measured by predictive log-likelihood.

The Dynamic Embedded Topic Model
Adji B. Dieng*, Francisco R. J. Ruiz*, David M. Blei
International Conference on Machine Learning (ICML) (Submitted)
arxiv / Code /

An extension of the Embedded Topic Model to corpora with temporal dependencies. The DETM models each word with a categorical distribution whose parameter is given by the inner product between the word embedding and an embedding representation of its assigned topic at a particular time step. The word embeddings allow the DETM to generalize to rare words. The DETM learns smooth topic trajectories by defining a random walk prior over the embeddings of the topics. The DETM is fit using structured amortized variational inference with LSTMs.

Topic Modeling in Embedding Spaces
Adji B. Dieng, Francisco R. J. Ruiz, David M. Blei
Under review at Transactions of the Association for Computational Linguistics (TACL), 2019
arxiv / Code /

Define words and topics in the same embedding space. Form a generative model of documents that defines the likelihood of a word as a Categorical whose natural parameter is the dot product between the word embedding and its assigned topic's embedding. The resulting Embedded Topic Model (ETM) learns interpretable topics and word embeddings and is robust to large vocabularies that include rare words and stop words.

Avoiding Latent Variable Collapse with Generative Skip Models
Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei
International Conference on Artificial Intelligence and Statistics (AISTATS), 2019
arxiv /

One of the current staples of unsupervised representation learning is variational autoencoders (VAEs). However they suffer from a problem known as "latent variable collapse". Our paper proposes a simple solution that relies on skip connections. This solution leads to the Skip-VAE--a deep generative model that avoids latent variable collapse. The decoder of a Skip-VAE is a neural network whose hidden states--at every layer--condition on the latent variables. This results in a stronger dependence between observations and their latents and therefore avoids latent variable collapse.

Noisin: Unbiased Regularization for Recurrent Neural Networks
Adji B. Dieng, Rajesh Ranganath, Jaan Altosaar, David M. Blei
International Conference on Machine Learning (ICML), 2018
arxiv / Slides

Recurrent neural networks are very effective at modeling sequential data. However they tend to have very high capacity and overfit very easily. We propose a new regularization method called Noisin. Noisin relies on the notion of "unbiased" noise injection. Noisin is an explicit regularizer--it's objective function can be decomposed as the original objective for the deterministic RNN and a non-negative data-dependent term. Noisin significantly outperforms Dropout on both the Penn TreeBank and the Wikitext-2 datasets on a language modeling task.

Augment and Reduce: Stochastic Inference for Large Categorical Distributions
Francisco J. R. Ruiz, Michalis Titsias, Adji B. Dieng, David M. Blei
International Conference on Machine Learning (ICML), 2018
arxiv / Slides / Code

Categorical distributions are ubiquitous in Statistics and Machine Learning. One wide parameterization of a categorical distribution is the softmax. However softmax does not scale well when there are many categories. We propose a method called A&R that scales learning with categorical distributions. A&R is built on two ideas: latent variable augmentation and stochastic variational expectation maximization.

TopicRNN: A Recurrent Neural Network With Long-Range Semantic Dependency
Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley,
International Conference on Learning Representations (ICLR), 2017
arxiv / Poster / Slides

One challenge in modeling sequential data with RNNs is the inability to capture long-term dependencies. In natural language these long-term dependencies come in the form of semantic dependencies. TopicRNN is a deep generative model of language that marries RNNs and topic models to capture long-term dependencies. The RNN component of the model captures syntax while the topic model component captures semantic. The topic model and the RNN parameters are learned jointly using amortized variational inference.

Variational Inference via Chi Upper Bound Minimization
Adji B. Dieng, Dustin Tran, Rajesh Ranganath, John Paisley, David M. Blei
Neural Information Processing Systems (NeurIPS), 2017
arxiv / Poster / Slides

Variational inference is an efficient approach for estimating posterior distributions. It consists in positing a family of distributions and finding the distribution in this family that better approximates the true posterior. The criterion for learning is a divergence measure. The most used divergence is the Kullback-Leibler (KL) divergence. However minimizing the KL leads to approximations that underestimate posterior uncertainty. Our paper proposes the Chi-divergence for variational inference. This divergence leads to an upper bound of the model evidence (called CUBO) and overdispersed posterior approximations. CUBO can be used alongside the usual ELBO to sandwich-estimate the model evidence.

Edward: A Library for Probabilistic Modeling, Inference, and Criticism
Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, David M. Blei

A tensorflow-based library for probabilistic programming.

Teaching /

I am fortunate to have been a teaching assistant for the following courses at Columbia University.


Statistical Machine Learning - Spring 2019

Advanced Data Analysis - Fall 2017

Statistical Methods for Finance - Spring 2016

Probability and Statistics for Data Science - Fall 2015

Linear Regression Models - Spring 2015

Probability - Fall 2014