Adji Bousso Dieng

CV  /  Google Scholar  /  LinkedIn  /  Github  /  Twitter  /  Email: abd2141 at columbia dot edu

I am a Ph.D student in the department of Statistics at Columbia University where I am jointly being advised by David Blei and John Paisley. In my research, I work on combining probabilistic graphical modeling and deep learning to design models for structured high-dimensional data such as text. I also work on variational methods as an inference framework for fitting these models.

 

Prior to joining Columbia I worked as a Junior Professional Associate at the World Bank. I did my undergraduate training in France where I attended Lycee Henri IV and Telecom ParisTech--France's Grandes Ecoles system. I hold a Diplome d'Ingenieur from Telecom ParisTech and spent the third year of Telecom ParisTech's curriculum at Cornell University where I earned a Master in Statistics.

News
Selected Invited Talks
Research

My goal as a Machine Learning researcher is twofold. My first goal is to combine deep learning and probabilistic graphical modeling to design models that are expressive and powerful enough to capture meaningful representations of high-dimensional structured data. My second goal is to develop efficient, scalable, and generic algorithms for learning with these models. Achieving these two goals will benefit many applications.

Avoiding Latent Variable Collapse with Generative Skip Models
Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei
International Conference on Artificial Intelligence and Statistics (AISTATS), 2019 (Submitted)
arxiv /

One of the current staples of unsupervised representation learning is variational autoencoders (VAEs). However they suffer from a problem known as "latent variable collapse". Our paper proposes a simple solution that relies on skip connections. This solution leads to the Skip-VAE--a deep generative model that avoids latent variable collapse. The decoder of a Skip-VAE is a neural network whose hidden states--at every layer--condition on the latent variables. This results in a stronger dependence between observations and their latents and therefore avoids latent variable collapse.

Noisin: Unbiased Regularization for Recurrent Neural Networks
Adji B. Dieng, Rajesh Ranganath, Jaan Altosaar, David M. Blei
International Conference on Machine Learning (ICLR), 2018
arxiv / Slides

Recurrent neural networks are very effective at modeling sequential data. However they tend to have very high capacity and overfit very easily. We propose a new regularization method called Noisin. Noisin relies on the notion of "unbiased" noise injection. Noisin is an explicit regularizer--it's objective function can be decomposed as the original objective for the deterministic RNN and a non-negative data-dependent term. Noisin significantly outperforms Dropout on both the Penn TreeBank and the Wikitext-2 datasets on a language modeling task.

Augment and Reduce: Stochastic Inference for Large Categorical Distributions
Francisco J. R. Ruiz, Michalis Titsias, Adji B. Dieng, David M. Blei
International Conference on Machine Learning (ICML), 2018
arxiv / Slides

Categorical distributions are ubiquitous in Statistics and Machine Learning. One wide parameterization of a categorical distribution is the softmax. However softmax does not scale well when there are many categories. We propose a method called A&R that scales learning with categorical distributions. A&R is built on two ideas: latent variable augmentation and stochastic variational expectation maximization.

TopicRNN: A Recurrent Neural Network With Long-Range Semantic Dependency
Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley,
International Conference on Learning Representations (ICLR), 2017
arxiv / Poster / Slides

One challenge in modeling sequential data with RNNs is the inability to capture long-term dependencies. In natural language these long-term dependencies come in the form of semantic dependencies. TopicRNN is a deep generative model of language that marries RNNs and topic models to capture long-term dependencies. The RNN component of the model captures syntax while the topic model component captures semantic. The topic model and the RNN parameters are learned jointly usingamortized variational inference.

Variational Inference via Chi Upper Bound Minimization
Adji B. Dieng, Dustin Tran, Rajesh Ranganath, John Paisley, David M. Blei
Neural Information Processing Systems (NIPS), 2017
arxiv / Poster / Slides

Variational inference is an efficient approach for estimating posterior distributions. It consists in positing a family of distributions and finding the distribution in this family that better approximates the true posterior. The criterion for learning is a divergence measure. The most used divergence is the Kullback-Leibler (KL) divergence. However minimizing the KL leads to approximations that underestimate posterior uncertainty. Our paper proposes the Chi-divergence for variational inference. This divergence leads to an upper bound of the model evidence (called CUBO) and overdispersed posterior approximations. CUBO can be used alongside the usual ELBO to sandwich-estimate the model evidence.

Edward: A Library for Probabilistic Modeling, Inference, and Criticism
Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, David M. Blei
arxiv

A tensorflow-based library for probabilistic programming.

Teaching /

I have been a teaching assistant for the following courses at Columbia University.

pacman

Advanced Data Analysis - Fall 2017

Statistical Methods for Finance - Spring 2016

Probability and Statistics for Data Science - Fall 2015

Linear Regression Models - Spring 2015

Probability - Fall 2014