PyTorch Implementation of Vector Quantized Variational AutoEncoders.

Last update: Oct 06, 2021

Related tags

Deep Learning VQVAE-PyTorch

Overview

Pytorch implementation of VQVAE.

This paper combines 2 tricks:

Vector Quantization (check out this amazing blog for better understanding.)
Straight-Through (It solves the problem of back-propagation through discrete latent variables, which are intractable.)

This model has a neural network encoder and decoder, and a prior just like the vanila Variational AutoEncoder(VAE). But this model also has a latent embedding space called codebook(size: K x D). Here, K is the size of latent space and D is the dimension of each embedding e.

In vanilla variational autoencoders, the output from the encoder z(x) is used to parameterize a Normal/Gaussian distribution, which is sampled from to get a latent representation z of the input x using the 'reparameterization trick'. This latent representation is then passed to the decoder. However, In VQVAEs, z(x) is used as a "key" to do nearest neighbour lookup into the embedding codebook c, and get zq(x), the closest embedding in the space. This is called Vector Quantization(VQ) operation. Then, zq(x) is passed to the decoder, which reconstructs the input x. The decoder can either parameterize p(x|z) as the mean of Normal distribution using a transposed convolution layer like in vannila VAE, or it can autoregressively generate categorical distribution over [0,255] pixel values like PixelCNN. In this project, the first approach is used.

The loss function is combined of 3 components:

Regular Reconstruction loss
Vector Quantization loss
Commitment loss

Vector Quantization loss encourages the items in the codebook to move closer to the encoder output ||sg[ze(x) - e||^2] and Commitment loss encourages the output of the encoder to be close to embedding it picked, to commit to its codebook embedding. ||ze(x) - sg[e]]||^2 . commitment loss is multiplied with a constant beta, which is 1.0 for this project. Here, sg means "stop-gradient". Which means we don't propagate the gradients with respect to that term.

Results:

The Model is trained on MNIST and CIFAR10 datasets.

Target 👉 Reconstructed Image

👉

Details:

Trained models for MNIST and CIFAR10 are in the Trained models directory.
Hidden size of the bottleneck(z) for MNIST and CIFAR10 is 128, 256 respectively.

PyTorch Implementation of Vector Quantized Variational AutoEncoders.

Related tags

Overview

Results:

Target 👉 Reconstructed Image

Details:

Owner

Vrushank Changawala

The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

The code repository for "PyCIL: A Python Toolbox for Class-Incremental Learning" in PyTorch.

This is a repository with the code for the ACL 2019 paper

[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)

Python PID Tuner - Based on a FOPDT model obtained using a Open Loop Process Reaction Curve

Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"

CVPR '21: In the light of feature distributions: Moment matching for Neural Style Transfer

Unofficial PyTorch implementation of MobileViT.

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Code for KDD'20 "Generative Pre-Training of Graph Neural Networks"

FewBit — a library for memory efficient training of large neural networks

TensorFlow implementation of the algorithm in the paper "Decoupled Low-light Image Enhancement"

An implementation of shampoo

Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

Assginment for UofT CSC420: Intro to Image Understanding

Process JSON files for neural recording sessions using Medtronic's BrainSense Percept PC neurostimulator

Code and dataset for AAAI 2021 paper FixMyPose: Pose Correctional Describing and Retrieval Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal.