Throughout the last decade, the practical advancements and the theoretical understanding of deep learning (DL) models and practices has arguably reached a level of maturity such that it is the preferred choice for any practitioner seeking simple yet powerful solutions to solve machine learning-related problems. With this tutorial we aim to expose the participants to novel trends in DL for scenarios where quantification of uncertainty matters and we will discuss new and emerging trends in the Bayesian deep learning community.

Description of the tutorial

Decision making processes are ubiquitous in social sciences and engineering and a sound modeling of uncertainty is key to build reliable and trustworthy systems. Throughout the last decade, the practical advancements and the theoretical understanding of deep learning models and practices has arguably reached a level of maturity such that it is the preferred choice for any practitioner seeking simple yet powerful solutions to solve machine learning problems.

The dissemination of DL could raise questions on how much we blindly rely on these model’s predictions, especially when accuracy is not the only important performance metric and when having sensible uncertainty quantification is a strict system requirement. With this tutorial we aim to expose the participants to novel trends in DL for scenarios where quantification of uncertainty matters. We will extensively discuss how a proper probabilistic treatment of such complex deep models is possible and feasible. We will also highlight new and emerging trends in the Bayesian deep learning community, and we will discuss some important computational aspects.

Overview of the content

The tutorial will last about 3h30m and will be divided into three main parts.

Part 1. Motivation for Bayesian inference in modern AI systems

The first part will be dedicated to motivation for a probabilistic treatment in systems powered by deep learing models. Following, we will show some fundamental results from Bayesian theory, upon which we will build the content of the next part.

Introduction of the speakers and summary of the tutorial
The need of reliable models
Limitations of loss-trained deep neural networks and the motivation for a probabilistic modeling for calibration of uncertainty, detection of out-of-distribution data and robustness to adversarial examples
Bayes’ Theorem and the concept of likelihood and prior/posterior distributions

Part 2. Bayesian neural networks: inference and modern trends

The second part will be entirely dedicated to the core of the tutorial: we will present some methodological results that allow us to do tractable Bayesian inference on deep neural networks , namely variational inference, Markov-Chain Monte Carlo methods, and other approximations.

Optimization as a way to perform inference on Bayesian neural networks (BNNs): an introduction to variational inference
- Monte-Carlo Dropout: the simplest way to have BNNs
- Formalization of the variational objective (and its gradients)
- Parameterization of variational inference and recent advancements
Sampling from intractable distributions with MCMC
- Introduction to Hamiltonian Monte Carlo (HMC)
- Scaling HMC for Bayesian deep learning with stochastic gradients
Ensembles and other approximations
- Ensemble as a way to perform Bayesian inference on neural networks
- Ensemble as a special case of variational inference
- Bayesian model averaging on DNN for scalable inference
- Laplace approximation
Neural networks are approximation of Gaussian processes: some lessons that can be learn

Part 3. Practical considerations and conclusions

Finally, the last part will be dedicated to some practical considerations (e.g. how to choose priors).

A problem for today is a solution for tomorrow: encoding prior knowledge for Bayesian DNN
Calibration of the uncertainty estimation for BNNs
Final remarks and take-away message

Material

Introduction – Variational Inference – Sampling with MCMC methods – Laplace approximation and Ensembles – Priors and practical considerations – Conclusions

Recordings

Potential target audience

The audience targeted by this tutorial is represented by practitioners and scientists willing or interested in using deel learning for systems where sound uncertainty quantification is a requirement. We will assume that the participants are comfortable with some DL basics, and some concepts of optimization (like mini-batch learning and back-propagation). A bit of experience with Bayesian inference is suggested but not required to successfully follow the tutorial, as we will dedicate a good part of the introduction to make sure everyone is on-par with some basic probability theory results before diving into the core content of this tutorial.

Motivation and objectives

Combined with the availability of open source libraries like Tensorflow and PyTorch, deep learning has quickly gained attraction in other communities, from cosmology and experimental physics to neuroscience , and it has cross-fertilized other computer science fields, such as digital hardware design, data management systems and materials science . Disconcertingly, näive implementations of DL models are found to be unreliable in some scenarios. A recent analysis of deep CNNs for classification, for example, showed that the predictions are systematically over-confident. In practice, this means that there is not a clear way to check whether the model is “sure” or not about a certain predictions and, as a consequence, taking informed decisions based on the output of such models should be carefully considered and properly assessed to avoid misinterpreting the model behavior. This is an interesting problem from a methodological research point of view but it is also a concerning aspect for any possible deployment of DL-based systems, for which a model is usually trained just once and could be interrogated with any kind of input data.

A Bayesian approach to deep learning has shown promising results when it comes to accurate quantification of uncertainty, without compromising on performance. The objective of this tutorial is to present a selection these methodological advancements for applying Bayesian inference techniques to deep learning models.

Presenters

Simone Rossi has been a PhD candidate under the supervision of Prof. Maurizio Filippone at EURECOM since 2018. He holds a MSc in Computer Engineering from ENST Telecom Paris (France) and a MSc in Electronic Engineering from Politecnico di Torino (Italy). His main research has been focused on novel methods for applying Bayesian inference to deep models (including Gaussian processes and deep Gaussian processes), with approximate variational inference techniques and Monte-Carlo methods.

Maurizio Filippone has been an Associate Professor at EURECOM since 2015. Prior to that, he carried out some postdoctoral experience in probabilistic machine learning in the UK (Sheffield, Glasgow and UCL) and became Assistant Professor at the University of Glasgow, UK in 2011. Since 2011, he has been teaching classes in probabilistic machine learning and artificial intelligence at postgraduate level. His research interests are in the development of practical and scalable methods for Bayesian inference and for Gaussian processes and deep Gaussian processes. In the last few years, he has received a prestigious 7-year fellowship from the AXA Research Fund and a 3-year research grant from the Agence Nationale de la Recherche to develop novel probabilistic-based approaches to advance risk modeling in life and environmental sciences.

References

Introduction to Variational Inference methods

Jordan et al. (1999). An Introduction to Variational Methodsfor Graphical Models. Mach. Learn.
Hoffman et al. (2013). Stochastic Variational Inference. JMLR
Ranganath et al. (2014). Black Box Variational Inference. AISTATS
Blei et al. (2017). Variational Inference: A Review for Statisticians. JASA

Monte-Carlo Dropout for Bayesian Neural Networks and follow-up

Srivastava et al. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JMLR
Kingma et al. (2015). Variational Dropout and the Local Reparameterization Trick. NeurIPS
Gal (2016). Uncertainty in Deep Learning. University of Cambridge (PhD Thesis)
Gal and Ghahramani (2016). Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. ICLR Workshop
Gal and Ghahramani (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. ICML
Kendall and Gal (2017). What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?. NeurIPS
Li and Gal (2017). Dropout Inference in Bayesian Neural Networks with Alpha-divergences. ICML
Hron et al. (2017). Variational Gaussian Dropout is not Bayesian. NeurIPS Workshop
Hron et al (2018). Variational Bayesian Dropout: Pitfalls and Fixes. ICML

Variational Inference for Bayesian Neural Networks

Graves (2011). Practical Variational Inference for Neural Networks. NeurIPS
Rezende et al. (2014). Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ICML
Hernández-Lobato et al. (2015). Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks. ICML
Blundell et al. (2015). Weight Uncertainty in Neural Networks. ICML
Rezende et al. (2015).Variational Inference with Normalizing Flows. ICML
Louizos and Welling (2016). Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors. ICML
Kingma et al. (2016). Improving Variational Inference with Inverse Autoregressive Flow. NeurIPS
Liu et al. (2016). Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm. NeurIPS
Miller et al. (2016). Variational Boosting: Iteratively Refining Posterior Approximations. ICML
Louizos and Welling (2017). Multiplicative Normalizing Flows for Variational Bayesian Neural Networks. ICML
Sun et al. (2017). Learning Structured Weight Uncertainty in Bayesian Neural Networks. AISTATS
Khan et al. (2018). Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in ADAM. ICML
Rossi et al. (2018). Good Initializations of Variational Bayes for Deep Models. ICML
Zhang et al. (2018). Noisy Natural Gradient as Variational Inference. ICML
Ghosh et al. (2018). Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors. ICML
Osawa et al. (2019). Practical Deep Learning with Bayesian Principles. NeurIPS
Sun et al. (2019). Functional Variational Bayesian Neural Networks. ICLR
Farquhar et al. (2020). Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations. NeurIPS
Rossi et al. (2020). Walsh-Hadamard Variational Inference for Bayesian Deep Learning. NeurIPS
Daxberger et al. (2021). Bayesian Deep Learning via Subnetwork Inference. ICML

Sampling of Bayesian neural network posterior

MacKay (1992). A Practical Bayesian Framework for Backpropagation Networks. Neural computation.
Neal (1996). Bayesian Learning for Neural Networks. Springer
Neal (2011). MCMC using Hamiltonian Dynamics. Hand-book of Markov Chain Monte Carlo
Ahn et al. (2012). Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring. ICML
Chen et al. (2014). Stochastic gradient Hamiltonian Monte Carlo. ICML
Betancourt (2015). The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling. ICML
Chen et al. (2015). On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators. NeurIPS
Springenberg et al. (2016). Bayesian Optimization with Robust Bayesian Neural Networks. NeurIPS
Mandt et al. (2017). Stochastic Gradient Descent as Approximate Bayesian Inference. JMLR
Zhang et al. (2020). Amagold: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC. AISTATS
Zhang et al. (2020). Cyclical stochastic gradient MCMC for Bayesian deep learning. ICLR
Cobb et al. (2021). Scaling Hamiltonian Monte Carlo Inference for Bayesian Neural Networks with Symmetric Splitting. UAI
Franzese et al. (2021). A Unified View of Stochastic Hamiltonian Sampling. arXiv
Izmailov et al. (2021). What Are Bayesian Neural Network Posteriors Really Like? ICML

Laplace approximation

MacKay (1991). Bayesian Model Comparison and Backprop Nets. NeurIPS
MacKay (1991). A Practical Bayesian Framework for Backpropagation Networks. Neural comput.
Williams and Barber (1998). Bayesian classification with Gaussian processes. IEEE PAMI
MacKay (1998). Choice of Basis for Laplace Approximation. Machine Learning
Schraudolph (2002). Fast curvature matrix-vector products for second-order gradient descent. Neural Comput.
Kuss and Rasmussen (2005). Assessing Approximate Inference for Binary Gaussian Process Classification. JMLR
Nickisch and Rasmussen (2008). Approximations for Binary Gaussian Process Classification. JMLR
Martens et al. (2015). Optimizing Neural Networks with Kronecker-factored Approximate Curvature. ICML
Botev et al. (2017). Practical Gauss-Newton Optimisation for Deep Learning. ICML
Ritter et al. (2018). A Scalable Laplace Approximation for Neural Networks. ICLR
Kunstner et al. (2019). Limitations of the Empirical Fisher Approximation for Natural Gradient Descent. NeurIPS
Dangel et al. (2020). BackPACK: Packing more into Backprop. ICLR
Immer et al. (2021). Improving predictions of Bayesian neural nets via local linearization. AISTATS
Immer et al. (2021). Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning. ICML
Kristiadi et al. (2021). Learnable Uncertainty under Laplace Approximations. UAI

Ensemble methods

Newton and Raftery (1994). Approximate Bayesian Inference with the Weighted Likelihood Bootstrap. JRSS - Series B
Lakshminarayanan et al. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. NeurIPS
Pearce et al. (2018). Bayesian Inference with Anchored Ensembles of Neural Networks, and Application to Reinforcement Learning. ICML Workshop
Pearce et al. (2018). Bayesian neural network ensembles. NeurIPS Workshop
Garipov et al. (2018). Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs. NeurIPS
Fort et al. (2019). Deep Ensembles: A Loss Landscape Perspective. NeurIPS BDL Workshop
Milios et al. (2020). Parametric Bootstrap Ensembles as Variational Inference. AABI
He at al. (2020). Bayesian Deep Ensembles via the Neural Tangent Kernel. NeurIPS

Infinite-limit Neural Networks

Rasmussen and Williams (2006). Gaussian Processes for Machine Learning, MIT Press
Damianou and Lawrence (2013). Deep Gaussian Processes. AISTATS
Cutajar et al. (2017). Random Features Expansions for Deep Gaussian Processes. ICML
Jacot et al. (2018). Neural Tangent Kernel: Convergence and Generalization in Neural Networks. NeurIPS
Matthews et al. (2018). Gaussian Process Behaviour in Wide Deep Neural Networks. ICLR
Lee et al. (2018). Deep Neural Networks as Gaussian Processes. ICLR
Novak et al. (2019). Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes. ICLR
Garriga-Alonso et al. (2019). Deep Convolutional Networks as shallow Gaussian Processes. ICLR
Yang (2019). Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes. NeurIPS
Khan et al. (2019). Approximate Inference Turns Deep Networks into Gaussian Processes. NeurIPS
Lee et al. (2019). Wide Neural Networks of Any Depth Evolve as Linear Models under Gradient Descent. NeurIPS

Bayesian Inference for Deep Learning [IJCAI 2021]

Tutorial presented by Simone Rossi and Prof. Maurizio Filippone