This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Overview

Orientation independent Möbius CNNs





This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Background (tl;dr)

All derivations and a detailed description of the models are found in Section 5 of our paper. What follows is an informal tl;dr, summarizing the central aspects of Möbius CNNs.

Feature fields on the Möbius strip: A key characteristic of the Möbius strip is its topological twist, making it a non-orientable manifold. Convolutional weight sharing on the Möbius strip is therefore only well defined up to a reflection of kernels. To account for the ambiguity of kernel orientations, one needs to demand that the kernel responses (feature vectors) transform in a predictable way when different orientations are chosen. Mathematically, this transformation is specified by a group representation ρ of the reflection group. We implement three different feature field types, each characterized by a choice of group representation:

  • scalar fields are modeled by the trivial representation. Scalars stay invariant under reflective gauge transformations:

  • sign-flip fields transform according to the sign-flip representation of the reflection group. Reflective gauge transformations negate the single numerical coefficient of a sign-flip feature:

  • regular feature fields are associated to the regular representation. For the reflection group, this implies 2-dimensional features whose two values (channels) are swapped by gauge transformations:

Reflection steerable kernels (gauge equivariance):

Convolution kernels on the Möbius strip are parameterized maps

whose numbers of input and output channels depend on the types of feature fields between which they map. Since a reflection of a kernel should result in a corresponding transformation of its output feature field, the kernel has to obey certain symmetry constraints. Specifically, kernels have to be reflection steerable (or gauge equivariant), i.e. should satisfy:

The following table visualizes this symmetry constraint for any pair of input and output field types that we implement:

Similar equivariance constraints are imposed on biases and nonlinearities; see the paper for more details.

Isometry equivariance: Shifts of the Möbius strip along itself are isometries. After one revolution (a shift by 2π), points on the strip do not return to themselves, but end up reflected along the width of the strip:

Such reflections of patterns are explained away by the reflection equivariance of the convolution kernels. Orientation independent convolutions are therefore automatically equivariant w.r.t. the action of such isometries on feature fields. Our empirical results, shown in the table below, confirm that this theoretical guarantee holds in practice. Conventional CNNs, on the other hand, are explicitly coordinate dependent, and are therefore in particular not isometry equivariant.

Implementation

Neural network layers are implemented in nn_layers.py while the models are found in models.py. All individual layers and all models are unit tested in unit_tests.py.

Feature fields: We assume Möbius strips with a locally flat geometry, i.e. strips which can be thought of as being constructed by gluing two opposite ends of a rectangular flat stripe together in a twisted way. Feature fields are therefore discretized on a regular sampling grid on a rectangular domain of pixels. Note that this choice induces a global gauge (frame field), which is discontinuous at the cut.

In practice, a neural network operates on multiple feature fields which are stacked in the channel dimension (a direct sum). Feature spaces are therefore characterized by their feature field multiplicities. For instance, one could have 10 scalar fields, 4 sign-flip fields and 8 regular feature fields, which consume in total channels. Denoting the batch size by , a feature space is encoded by a tensor of shape .

The correct transformation law of the feature fields is guaranteed by the coordinate independence (steerability) of the network layers operating on it.

Orientation independent convolutions and bias summation: The class MobiusConv implements orientation independent convolutions and bias summations between input and output feature spaces as specified by the multiplicity constructor arguments in_fields and out_fields, respectively. Kernels are as usual discretized by a grid of size*size pixels. The steerability constraints on convolution kernels and biases are implemented by allocating a reduced number of parameters, from which the symmetric (steerable) kernels and biases are expanded during the forward pass.

Coordinate independent convolutions rely furthermore on parallel transporters of feature vectors, which are implemented as a transport padding operation. This operation pads both sides of the cut with size//2 columns of pixels which are 1) spatially reflected and 2) reflection-steered according to the field types. The stripes are furthermore zero-padded along their width.

The forward pass operates then by:

  • expanding steerable kernels and biases from their non-redundant parameter arrays
  • transport padding the input field array
  • running a conventional Euclidean convolution

As the padding added size//2 pixels around the strip, the spatial resolution of the output field agrees with that of the input field.

Orientation independent nonlinearities: Scalar fields and regular feature fields are acted on by conventional ELU nonlinearities, which are equivariant for these field types. Sign-flip fields are processed by applying ELU nonlinearities to their absolute value after summing a learnable bias parameter. To ensure that the resulting fields are again transforming according to the sign-flip representation, we multiply them subsequently with the signs of the input features. See the paper and the class EquivNonlin for more details.

Feature field pooling: The module MobiusPool implements an orientation independent pooling operation with a stride and kernel size of two pixels, thus halving the fields' spatial resolution. Scalar and regular feature fields are pooled with a conventional max pooling operation, which is for these field types coordinate independent. As the coefficients of sign-flip fields negate under gauge transformations, they are pooled based on their (gauge invariant) absolute value.

While the pooling operation is tested to be exactly gauge equivariant, its spatial subsampling interferes inevitably with its isometry equivariance. Specifically, the pooling operation is only isometry equivariant w.r.t. shifts by an even number of pixels. Note that the same issue applies to conventional Euclidean CNNs as well; see e.g. (Azulay and Weiss, 2019) or (Zhang, 2019).

Models: All models are implemented in models.py. The orientation independent models, which differ only in their field type multiplicities but agree in their total number of channels, are implemented as class MobiusGaugeCNN. We furthermore implement conventional CNN baselines, one with the same number of channels and thus more parameters (α=1) and one with the same number of parameters but less channels (α=2). Since conventional CNNs are explicitly coordinate dependent they utilize a naive padding operation (MobiusPadNaive), which performs a spatial reflection of feature maps but does not apply the unspecified gauge transformation. The following table gives an overview of the different models:

Data - Möbius MNIST

We benchmark our models on Möbius MNIST, a simple classification dataset which consists of MNIST digits that are projected on the Möbius strip. Since MNIST digits are gray-scale images, they are geometrically identified as scalar fields. The size of the training set is by default set to 12000 digits, which agrees with the rotated MNIST dataset.

There are two versions of the training and test sets which consist of centered and shifted digits. All digits in the centered datasets occur at the same location (and the same orientation) of the strip. The isometry shifted digits appear at uniformly sampled locations. Recall that shifts once around the strip lead to a reflection of the digits as visualized above. The following digits show isometry shifted digits (note the reflection at the cut):

To generate the datasets it is sufficient to call convert_mnist.py, which downloads the original MNIST dataset via torchvision and saves the Möbius MNIST datasets in data/mobius_MNIST.npz.

Results

The models can then be trained by calling, for instance,

python train.py --model mobius_regular

For more options and further model types, consult the help message: python train.py -h

The following table gives an overview of the performance of all models in two different settings, averaged over 32 runs:

The setting "shifted train digits" trains and evaluates on isometry shifted digits. To test the isometry equivariance of the models, we train them furthermore on "centered train digits", testing them then out-of-distribution on shifted digits. As one can see, the orientation independent models generalize well over these unseen variations while the conventional coordinate dependent CNNs' performance deteriorates.

Dependencies

This library is based on Python3.7. It requires the following packages:

numpy
torch>=1.1
torchvision>=0.3

Logging via tensorboard is optional.

Owner
Maurice Weiler
AI researcher with a focus on geometric and equivariant deep learning. PhD candidate under the supervision of Max Welling. Master's degree in Physics.
Maurice Weiler
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

OCTIS : Optimizing and Comparing Topic Models is Simple! OCTIS (Optimizing and Comparing Topic models Is Simple) aims at training, analyzing and compa

MIND 478 Jan 01, 2023
JupyterLite demo deployed to GitHub Pages 🚀

JupyterLite Demo JupyterLite deployed as a static site to GitHub Pages, for demo purposes. ✨ Try it in your browser ✨ ➡️ https://jupyterlite.github.io

JupyterLite 223 Jan 04, 2023
A pytorch &keras implementation and demo of Fastformer.

Fastformer Notes from the authors Pytorch/Keras implementation of Fastformer. The keras version only includes the core fastformer attention part. The

153 Dec 28, 2022
A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.

Minimal Body A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image. The model file is only 51.2 MB and runs a

Yuxiao Zhou 49 Dec 05, 2022
TransferNet: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network

TransferNet: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network Created by Seunghoon Hong, Junhyuk Oh,

42 Jun 29, 2022
Collection of machine learning related notebooks to share.

ML_Notebooks Collection of machine learning related notebooks to share. Notebooks GAN_distributed_training.ipynb In this Notebook, TensorFlow's tutori

Sascha Kirch 14 Dec 22, 2022
Computer Vision Paper Reviews with Key Summary of paper, End to End Code Practice and Jupyter Notebook converted papers

Computer-Vision-Paper-Reviews Computer Vision Paper Reviews with Key Summary along Papers & Codes. Jonathan Choi 2021 The repository provides 100+ Pap

Jonathan Choi 2 Mar 17, 2022
Anchor-free Oriented Proposal Generator for Object Detection

Anchor-free Oriented Proposal Generator for Object Detection Gong Cheng, Jiabao Wang, Ke Li, Xingxing Xie, Chunbo Lang, Yanqing Yao, Junwei Han, Intro

jbwang1997 56 Nov 15, 2022
Notebooks, slides and dataset of the CorrelAid Machine Learning Winter School

CorrelAid Machine Learning Winter School Welcome to the CorrelAid ML Winter School! Task The problem we want to solve is to classify trees in Roosevel

CorrelAid 12 Nov 23, 2022
"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

FGVC8 Exploring Vision Transformers for Fine-grained Classification paper presented at the CVPR 2021, The Eight Workshop on Fine-Grained Visual Catego

Marcos V. Conde 19 Dec 06, 2022
🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval 🏆 The 1st Place Submission to AICity Challenge 2021 Natural

82 Dec 29, 2022
[CVPR2021] De-rendering the World's Revolutionary Artefacts

De-rendering the World's Revolutionary Artefacts Project Page | Video | Paper In CVPR 2021 Shangzhe Wu1,4, Ameesh Makadia4, Jiajun Wu2, Noah Snavely4,

49 Nov 06, 2022
Controlling the MicriSpotAI robot from scratch

Project-MicroSpot-AI Controlling the MicriSpotAI robot from scratch Colaborators Alexander Dennis Components from MicroSpot The MicriSpotAI has the fo

Dennis Núñez-Fernández 5 Oct 20, 2022
The reference baseline of final exam for XMU machine learning course

Mini-NICO Baseline The baseline is a reference method for the final exam of machine learning course. Requirements Installation we use /python3.7 /torc

JoaquinChou 3 Dec 29, 2021
Few-shot Learning of GPT-3

Few-shot Learning With Language Models This is a codebase to perform few-shot "in-context" learning using language models similar to the GPT-3 paper.

Tony Z. Zhao 224 Dec 28, 2022
Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation.

DuoRec Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. Usage Download datasets fr

Qrh 46 Dec 19, 2022
My personal code and solution to the Synacor Challenge from 2012 OSCON.

Synacor OSCON Challenge Solution (2012) This repository contains my code and solution to solve the Synacor OSCON 2012 Challenge. If you are interested

2 Mar 20, 2022
A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥

Lightning-Hydra-Template A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥 Click on Use this template to initialize new re

Hyunsoo Cho 1 Dec 20, 2021
Code for Mesh Convolution Using a Learned Kernel Basis

Mesh Convolution This repository contains the implementation (in PyTorch) of the paper FULLY CONVOLUTIONAL MESH AUTOENCODER USING EFFICIENT SPATIALLY

Yi_Zhou 35 Jan 03, 2023
This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their

Liron Bdolah 8 May 22, 2022