DeepMind Perceiver (in PyTorch)

Disclaimer: This is not official and I'm not affiliated with DeepMind.

My implementation of the Perceiver: General Perception with Iterative Attention. You can read more about the model on DeepMind's website.

I trained an MNIST model which you can find in models/mnist.pkl or by using perceiver.load_mnist_model(). It gets 96.02% on the test-data.

Getting started

To run this you need PyTorch installed:

pip3 install torch

From perceiver you can import Perceiver or PerceiverLogits.

Then you can use it as such (or look in examples.ipynb):

from perceiver import Perceiver

model = Perceiver(
    input_channels, # <- How many channels in the input? E.g. 3 for RGB.
    input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
    fourier_bands=4, # <- How many bands should the positional encoding have?
    latents=64, # <- How many latent vectors?
    d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
    heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
    latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
    dropout=0.1, # <- Dropout
    layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)

The above model outputs the latents after the final layer. If you want logits instead, use the following model:

from perceiver import PerceiverLogits

model = PerceiverLogits(
    input_channels, # <- How many channels in the input? E.g. 3 for RGB.
    input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
    output_features, # <- How many different classes? E.g. 10 for MNIST.
    fourier_bands=4, # <- How many bands should the positional encoding have?
    latents=64, # <- How many latent vectors?
    d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
    heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
    latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
    dropout=0.1, # <- Dropout
    layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)

To use my pre-trained MNIST model (not very good):

from perceiver import load_mnist_model

model = load_mnist_model()

TODO:

Positional embedding generalized to n dimensions (with fourier features)
Train other models (like CIFAR-100 or something not in the image domain)
Type indication
Unit tests for components of model
Package

My implementation of DeepMind's Perceiver

Related tags

Overview

DeepMind Perceiver (in PyTorch)

Getting started

TODO:

Owner

Louis Arge

Code to accompany our paper "Continual Learning Through Synaptic Intelligence" ICML 2017

[ECCV2020] Content-Consistent Matching for Domain Adaptive Semantic Segmentation

Unofficial implementation of PatchCore anomaly detection

Implementation of ConvMixer for "Patches Are All You Need? 🤷"

Using deep actor-critic model to learn best strategies in pair trading

Learning embeddings for classification, retrieval and ranking.

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

A Dataset for Direct Quotation Extraction and Attribution in News Articles.

Training a Resilient Q-Network against Observational Interference, Causal Inference Q-Networks

Code for "Unsupervised Layered Image Decomposition into Object Prototypes" paper

Tensorflow-Project-Template - A best practice for tensorflow project template architecture.

Bayesian inference for Permuton-induced Chinese Restaurant Process (NeurIPS2021).

Jax/Flax implementation of Variational-DiffWave.

HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events globally on daily to subseasonal timescales.

Implementation of the SUMO (Slim U-Net trained on MODA) model

Ranger deep learning optimizer rewrite to use newest components

A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

SegNet including indices pooling for Semantic Segmentation with tensorflow and keras

HyperCube: Implicit Field Representations of Voxelized 3D Models