I am a Ph.D candidate in the department of Statistics
at Columbia University where I am jointly being advised by David Blei
and John Paisley. In my research, I work on combining probabilistic graphical modeling and deep learning to design models for structured highdimensional data such as text. I also work on variational methods as an inference framework for fitting these models. My work is funded by a Columbia Dean Fellowship and a Google PhD Fellowship in Machine Learning.
Prior to joining Columbia I worked as a Junior Professional Associate at the World Bank. I did my undergraduate training in France where I attended Lycee Henri IV and Telecom ParisTechFrance's Grandes Ecoles system. I hold a Diplome d'Ingenieur from Telecom ParisTech and spent the third year of Telecom ParisTech's curriculum at Cornell University where I earned a Master in Statistics.

Selected Invited Talks
 Carnegie Mellon University Machine Learning Seminar, Pittsburgh, PA, November 2019
 Berkeley EECS Seminar, Berkeley, CA, October 2019
 IPAM Workshop on Interpretable Learning in Physical Systems, Los Angeles, CA, October 2019
 University of Maryland's Rising Stars in Machine Learning seminar, College Park, MD, September 2019
 Yahoo Research Seminar Series, New York, NY, July 2019
 New York Machine Learning and Artificial Intelligence Meetup, New York, NY, June 2019
 NYU Text as Data Seminar Series, New York, NY, March 2019
 South England Natural Language Processing Meetup, London, UK, January 2019
 Microsoft Research Cambridge, Cambridge, UK, January 2019
 Tufts University CS Colloquium , Medford, MA, April 2018
 Harvard University NLP Group Meeting, Cambridge, MA, April 2018
 Stanford University NLP Seminar , Stanford, CA, April 2018
 New York Academy of Science ML Symposium , NY, March 2018

Machine Learning and Friends Seminar, UMass, Amherst, MA, February 2018
 Black in AI Workshop, Long Beach, CA, December 2017
 MSR AI, Microsoft Research, Redmond, WA, August 2017
 SSLI Lab, University of Washington, Seattle, WA, August 2017
 DeepLoria, Loria Laboratory, Nancy, France, April 2017
 AI With The Best, Online, April 2017
 OpenAI, San Francisco, CA, January 2017
 IBM TJ Watson Research, Yorktown Heights, NY, December 2016
 Microsoft Research, Redmond, WA, August 2016

Research
My goal as a Machine Learning researcher is twofold. My first goal is to combine deep learning and probabilistic graphical modeling to design models that are expressive and powerful enough to capture meaningful representations of highdimensional structured data. My second goal is to develop efficient, scalable, and generic algorithms for learning with these models. Achieving these two goals will benefit many applications.


Prescribed Generative Adversarial Networks
Adji B. Dieng,
Francisco R. J. Ruiz,
David M. Blei,
Michalis Titsias
Journal of Machine Learning Research (JMLR) (Submitted)
arxiv
/
Code
/
This paper describes a solution to two important problems in the GAN literature: (1) How can we maximize the entropy of the generator of a GAN to prevent mode collapse? (2) How can we evaluate predictive loglikelihood for GANs to assess how they generalize to new data? Key ingredients: noise, entropy regularization, and Hamiltonian Monte Carlo.


Reweighted Expectation Maximization
Adji B. Dieng,
John Paisley
Under submission at Journal of Machine Learning Research (JMLR)
arxiv
/
Code
/
Maximum likelihood in deep generative models is hard. The typical workaround is variational inference (VI) which maximizes a lower bound to the log marginal likelihood of the data. VI introduces an undesirable amortization gap and often causes latent variable collapse. We propose to use expectation maximization (EM) instead. Importantly, we separate posterior inference and model fitting. To fit the model we leverage moment matching to learn rich proposals to estimate the EM objective. Posterior inference is done after the model is fitted. This twostep procedure shies away from the current VAE approach of bundling together model fitting and posterior inference. Turns out EM learns better deep generative models than VI as measured by predictive loglikelihood.


The Dynamic Embedded Topic Model
Adji B. Dieng*,
Francisco R. J. Ruiz*,
David M. Blei
International Conference on Machine Learning (ICML) (Submitted)
arxiv
/
Code
/
An extension of the Embedded Topic Model to corpora with temporal dependencies. The DETM models each word with a categorical distribution whose parameter is given by the inner product between the word embedding and an embedding representation of its assigned topic at a particular time step. The word embeddings allow the DETM to generalize to rare words. The DETM learns smooth topic trajectories by defining a random walk prior over the embeddings of the topics. The DETM is fit using structured amortized variational inference with LSTMs.


Topic Modeling in Embedding Spaces
Adji B. Dieng,
Francisco R. J. Ruiz,
David M. Blei
Under review at Transactions of the Association for Computational Linguistics (TACL), 2019
arxiv
/
Code
/
Define words and topics in the same embedding space. Form a generative model of documents that defines the likelihood of a word as a Categorical whose natural parameter is the dot product between the word embedding and its assigned topic's embedding. The resulting Embedded Topic Model (ETM) learns interpretable topics and word embeddings and is robust to large vocabularies that include rare words and stop words.


Avoiding Latent Variable Collapse with Generative Skip Models
Adji B. Dieng,
Yoon Kim,
Alexander M. Rush,
David M. Blei
International Conference on Artificial Intelligence and Statistics (AISTATS), 2019
arxiv
/
One of the current staples of unsupervised representation learning is variational autoencoders (VAEs). However they suffer from a problem known as "latent variable collapse". Our paper proposes a simple solution that relies on skip connections. This solution leads to the SkipVAEa deep generative model that avoids latent variable collapse. The decoder of a SkipVAE is a neural network whose hidden statesat every layercondition on the latent variables. This results in a stronger dependence between observations and their latents and therefore avoids latent variable collapse.


Noisin: Unbiased Regularization for Recurrent Neural Networks
Adji B. Dieng,
Rajesh Ranganath,
Jaan Altosaar,
David M. Blei
International Conference on Machine Learning (ICML), 2018
arxiv
/
Slides
Recurrent neural networks are very effective at modeling sequential data. However they tend to have very high capacity and overfit very easily. We propose a new regularization method called Noisin. Noisin relies on the notion of "unbiased" noise injection. Noisin is an explicit regularizerit's objective function can be decomposed as the original objective for the deterministic RNN and a nonnegative datadependent term. Noisin significantly outperforms Dropout on both the Penn TreeBank and the Wikitext2 datasets on a language modeling task.


Augment and Reduce: Stochastic Inference for Large Categorical Distributions
Francisco J. R. Ruiz,
Michalis Titsias,
Adji B. Dieng,
David M. Blei
International Conference on Machine Learning (ICML), 2018
arxiv
/
Slides
/
Code
Categorical distributions are ubiquitous in Statistics and Machine Learning. One wide parameterization of a categorical distribution is the softmax. However softmax does not scale well when there are many categories. We propose a method called A&R that scales learning with categorical distributions. A&R is built on two ideas: latent variable augmentation and stochastic variational expectation maximization.


TopicRNN: A Recurrent Neural Network With LongRange Semantic Dependency
Adji B. Dieng,
Chong Wang,
Jianfeng Gao,
John Paisley,
International Conference on Learning Representations (ICLR), 2017
arxiv /
Poster
/
Slides
One challenge in modeling sequential data with RNNs is the inability to capture longterm dependencies. In natural language these longterm dependencies come in the form of semantic dependencies. TopicRNN is a deep generative model of language that marries RNNs and topic models to capture longterm dependencies. The RNN component of the model captures syntax while the topic model component captures semantic. The topic model and the RNN parameters are learned jointly using amortized variational inference.


Variational Inference via Chi Upper Bound Minimization
Adji B. Dieng,
Dustin Tran,
Rajesh Ranganath,
John Paisley,
David M. Blei
Neural Information Processing Systems (NeurIPS), 2017
arxiv
/
Poster
/
Slides
Variational inference is an efficient approach for estimating posterior distributions. It consists in positing a family of distributions and finding the distribution in this family that better approximates the true posterior. The criterion for learning is a divergence measure. The most used divergence is the KullbackLeibler (KL) divergence. However minimizing the KL leads to approximations that underestimate posterior uncertainty. Our paper proposes the Chidivergence for variational inference. This divergence leads to an upper bound of the model evidence (called CUBO) and overdispersed posterior approximations. CUBO can be used alongside the usual ELBO to sandwichestimate the model evidence.


Edward: A Library for Probabilistic Modeling, Inference, and Criticism
Dustin Tran,
Alp Kucukelbir,
Adji B. Dieng,
Maja Rudolph,
Dawen Liang,
David M. Blei
arxiv
A tensorflowbased library for probabilistic programming.

Teaching
/
I am fortunate to have been a teaching assistant for the following courses at Columbia University.


Statistical Machine Learning  Spring 2019
Advanced Data Analysis  Fall 2017
Statistical Methods for Finance  Spring 2016
Probability and Statistics for Data Science  Fall 2015
Linear Regression Models  Spring 2015
Probability  Fall 2014

