Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

Overview

PyVarInf

PyVarInf provides facilities to easily train your PyTorch neural network models using variational inference.

Bayesian Deep Learning with Variational Inference

Bayesian Deep Learning

Assume we have a dataset D = {(x1, y1), ..., (xn, yn)} where the x's are the inputs and the y's the outputs. The problem is to predict the y's from the x's. Further assume that p(D|θ) is the output of a neural network with weights θ. The network loss is defined as

Usually, when training a neural network, we try to find the parameter θ* which minimizes Ln(θ).

In Bayesian Inference, the problem is instead to study the posterior distribution of the weights given the data. Assume we have a prior α over ℝd. The posterior is

This can be used for model selection, or prediction with Bayesian Model Averaging.

Variational Inference

It is usually impossible to analytically compute the posterior distribution, especially with models as complex as neural networks. Variational Inference adress this problem by approximating the posterior p(θ|D) by a parametric distribution q(θ|φ) where φ is a parameter. The problem is then not to learn a parameter θ* but a probability distribution q(θ|φ) minimizing

F is called the variational free energy.

This idea was originally introduced for deep learning by Hinton and Van Camp [5] as a way to use neural networks for Minimum Description Length [3]. MDL aims at minimizing the number of bits used to encode the whole dataset. Variational inference introduces one of many data encoding schemes. Indeed, F can be interpreted as the total description length of the dataset D, when we first encode the model, then encode the part of the data not explained by the model:

  • LC(φ) = KL(q(.|φ)||α) is the complexity loss. It measures (in nats) the quantity of information contained in the model. It is indeed possible to encode the model in LC(φ) nats, with the bits-back code [4].
  • LE(φ) = Eθ ~ q(θ|φ)[Ln(θ)] is the error loss. It measures the necessary quantity of information for encoding the data D with the model. This code length can be achieved with a Shannon-Huffman code for instance.

Therefore F(φ) = LC(φ) + LE(φ) can be rephrased as an MDL loss function which measures the total encoding length of the data.

Practical Variational Optimisation

In practice, we define φ = (µ, σ) in ℝd x ℝd, and q(.|φ) = N(µ, Σ) the multivariate distribution where Σ = diag(σ12, ..., σd2), and we want to find the optimal µ* and σ*.

With this choice of a gaussian posterior, a Monte Carlo estimate of the gradient of F w.r.t. µ and σ can be obtained with backpropagation. This allows to use any gradient descent method used for non-variational optimisation [2]

Overview of PyVarInf

The core feature of PyVarInf is the Variationalize function. Variationalize takes a model as input and outputs a variationalized version of the model with gaussian posterior.

Definition of a variational model

To define a variational model, first define a traditional PyTorch model, then use the Variationalize function :

import pyvarinf
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
        self.bn1 = nn.BatchNorm2d(10)
        self.bn2 = nn.BatchNorm2d(20)

    def forward(self, x):
        x = self.bn1(F.relu(F.max_pool2d(self.conv1(x), 2)))
        x = self.bn2(F.relu(F.max_pool2d(self.conv2(x), 2)))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x)

model = Net()
var_model = pyvarinf.Variationalize(model)
var_model.cuda()

Optimisation of a variational model

Then, the var_model can be trained that way :

optimizer = optim.Adam(var_model.parameters(), lr=0.01)

def train(epoch):
    var_model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = var_model(data)
        loss_error = F.nll_loss(output, target)
	# The model is only sent once, thus the division by
	# the number of datapoints used to train
        loss_prior = var_model.prior_loss() / 60000
        loss = loss_error + loss_prior
        loss.backward()
        optimizer.step()

for epoch in range(1, 500):
    train(epoch)

Available priors

In PyVarInf, we have implemented four families of priors :

Gaussian prior

The gaussian prior is N(0,Σ), with Σ the diagonal matrix diag(σ12, ..., σd2) defined such that 1/σi is the square root of the number of parameters in the layer, following the standard initialisation of neural network weights. It is the default prior, and do not have any parameter. It can be set with :

var_model.set_prior('gaussian')

Conjugate priors

The conjugate prior is used if we assume that all the weights in a given layer should be distributed as a gaussian, but with unknown mean and variance. See [6] for more details. This prior can be set with

var_model.set_prior('conjugate', n_mc_samples, alpha_0, beta_0, mu_0, kappa_0)

There are five parameters that have to bet set :

  • n_mc_samples, the number of samples used in the Monte Carlo estimation of the prior loss and its gradient.
  • mu_0, the prior sample mean
  • kappa_0, the number of samples used to estimate the prior sample mean
  • alpha_0 and beta_0, such that variance was estimated from 2 alpha_0 observations with sample mean mu_0 and sum of squared deviations 2 beta_0

Conjugate prior with known mean

The conjugate prior with known mean is similar to the conjugate prior. It is used if we assume that all the weights in a given layer should be distributed as a gaussian with a known mean but unknown variance. It is usefull in neural networks model when we assume that the weights in a layer should have mean 0. See [6] for more details. This prior can be set with :

var_model.set_prior('conjugate_known_mean', n_mc_samples, mean, alpha_0, beta_0)

Four parameters have to be set:

  • n_mc_samples, the number of samples used in the Monte Carlo estimation of the prior loss and its gradient.
  • mean, the known mean
  • alpha_0 and beta_0 defined as above

Mixture of two gaussian

The idea of using a mixture of two gaussians is defined in [1]. This prior can be set with:

var_model.set_prior('mixtgauss', n_mc_samples, sigma_1, sigma_2, pi)
  • n_mc_samples, the number of samples used in the Monte Carlo estimation of the prior loss and its gradient.
  • sigma_1 and sigma_2 the std of the two gaussians
  • pi the probability of the first gaussian

Requirements

This module requires Python 3. You need to have PyTorch installed for PyVarInf to work (as PyTorch is not readily available on PyPi). To install PyTorch, follow the instructions described here.

References

  • [1] Blundell, Charles, Cornebise, Julien, Kavukcuoglu, Koray, and Wierstra, Daan. Weight Uncertainty in Neural Networks. In International Conference on Machine Learning, pp. 1613–1622, 2015.
  • [2] Graves, Alex. Practical Variational Inference for Neural Networks. In Neural Information Processing Systems, 2011.
  • [3] Grünwald, Peter D. The Minimum Description Length principle. MIT press, 2007.
  • [4] Honkela, Antti and Valpola, Harri. Variational Learning and Bits-Back Coding: An Information-Theoretic View to Bayesian Learning. IEEE transactions on Neural Networks, 15(4), 2004.
  • [5] Hinton, Geoffrey E and Van Camp, Drew. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights. In Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
  • [6] Murphy, Kevin P. Conjugate Bayesian analysis of the Gaussian distribution., 2007.
MakeItTalk: Speaker-Aware Talking-Head Animation

MakeItTalk: Speaker-Aware Talking-Head Animation This is the code repository implementing the paper: MakeItTalk: Speaker-Aware Talking-Head Animation

Adobe Research 285 Jan 08, 2023
An Straight Dilated Network with Wavelet for image Deblurring

SDWNet: A Straight Dilated Network with Wavelet Transformation for Image Deblurring(offical) 1. Introduction This repo is not only used for our paper(

FlyEgle 41 Jan 04, 2023
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" ([email protected])

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

Wanyu Du 18 Dec 29, 2022
Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction".

TGIN Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction". Files in the folder dataset/ electr

Alibaba 21 Dec 21, 2022
Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

TableauBits 3 May 29, 2022
Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks

Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks This is a Pytorch-Lightning implementation of the paper "Self-s

Photogrammetry & Robotics Bonn 111 Dec 06, 2022
A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks)

A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks) This repository contains a PyTorch implementation for the paper: Deep Pyra

Greg Dongyoon Han 262 Jan 03, 2023
Repo for Photon-Starved Scene Inference using Single Photon Cameras, ICCV 2021

Photon-Starved Scene Inference using Single Photon Cameras ICCV 2021 Arxiv Project Video Bhavya Goyal, Mohit Gupta University of Wisconsin-Madison Abs

Bhavya Goyal 5 Nov 15, 2022
This is the face keypoint train code of project face-detection-project

face-key-point-pytorch 1. Data structure The structure of landmarks_jpg is like below: |--landmarks_jpg |----AFW |------AFW_134212_1_0.jpg |------AFW_

I‘m X 3 Nov 27, 2022
An image processing project uses Viola-jones technique to detect faces and then use SIFT algorithm for recognition.

Attendance_System An image processing project uses Viola-jones technique to detect faces and then use LPB algorithm for recognition. Face Detection Us

8 Jan 11, 2022
SatelliteNeRF - PyTorch-based Neural Radiance Fields adapted to satellite domain

SatelliteNeRF PyTorch-based Neural Radiance Fields adapted to satellite domain.

Kai Zhang 46 Nov 20, 2022
Scalable Multi-Agent Reinforcement Learning

Scalable Multi-Agent Reinforcement Learning 1. Featured algorithms: Value Function Factorization with Variable Agent Sub-Teams (VAST) [1] 2. Implement

3 Aug 02, 2022
Experiments for Fake News explainability project

fake-news-explainability Experiments for fake news explainability project This repository only contains the notebooks used to train the models and eva

Lorenzo Flores (Lj) 1 Dec 03, 2022
GDSC-ML Team Interview Task

GDSC-ML-Team---Interview-Task Task 1 : Clean or Messy room In this task we have to classify the given test images as clean or messy. - Link for datase

Aayush. 1 Jan 19, 2022
A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Overview This is a set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI. Make TFRecords To run t

8 Nov 01, 2022
Implementation of neural class expression synthesizers

NCES Implementation of neural class expression synthesizers (NCES) Installation Clone this repository: https://github.com/ConceptLengthLearner/NCES.gi

NeuralConceptSynthesis 0 Jan 06, 2022
a generic C++ library for image analysis

VIGRA Computer Vision Library Copyright 1998-2013 by Ullrich Koethe This file is part of the VIGRA computer vision library. You may use,

Ullrich Koethe 378 Dec 30, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
Content shared at DS-OX Meetup

Streamlit-Projects Streamlit projects available in this repo: An introduction to Streamlit presented at DS-OX (Feb 26, 2020) meetup Streamlit 101 - Ja

Arvindra 69 Dec 23, 2022
Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

Junbo (Jake) Zhao 399 Jan 02, 2023