An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

Related tags

Deep LearningGLOM
Overview

GLOM

An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" for MNIST Dataset. To understand this implementation, please watch Yannick Kilcher's GLOM video, then read this README.md, then read the code.

Running

Open in jupyter notebook to run. Program expects an Nvidia graphics card for gpu speedup. If you run out of gpu memory, decrease the batch_size variable. If you want to look at the code on github and it fails, try reloading or refreshing several times.

Results

The best models, which have been posted under the best_models folder, reached an accuracy of about 91%.

Implementation details

Three Types of networks per layer of vectors

  1. Top-Down Network
  2. Bottom-up Network
  3. Attention on the same layer Network

Intro to State

There is an initial state that all three types of network outputs get added to after every time step. The bottom layer of the state is the input vector where the MNIST pixel data is kept and doesn't get anything added to it to retain the MNIST pixel data. The top layer of the state is the output layer where the loss function is applied and trained to be the one-hot MNIST target vector.

Explanation of compute_all function

Each type of network will see a 3x3 grid of vectors surrounding the current network input vector at the current layer. This is done to allow information to travel faster laterally across vectors, allowing for more information to be sent across an image in less steps. The easy way to do this is to shift (or roll) every vector along the x and y axis and then concatenate the vectors ontop of eachother so that every place a vector used to be in the state, now contains every vector and its neighboring vectors in the same layer. This also connects the edges of the image so that data can be passed from one edge of the image to the other, reducing the maximum distance any two pixels or vectors can be from one another.

For a more complex dataset, its possible this could pose some issues since two separate edges of an image aren't generally continous, but for MNIST, this problem doesn't arise. Then, these vectors are fed to each type of model. The models will get an input of all neighboring state vectors for a certain layer for each pixel that is given. Each model will then output a single vector. But there are 3 types of models per layer. In this example, every line drawn is a new model that is reused for every pixel this process is done for. After each model type has given an output, the three lists of vectors are added together.

This will give a single list of vectors that will be added to the corresponding list of vectors at the specific x,y coordinate from the original state.

Repeating this step for every list of vectors per x,y coordinate in the original state will yield the full new State value.

Since each network only sees a 3x3 grid and not larger image patches, this technique can be used for any size images and is easily parrallelizable.

If I had more compute

My 2080Ti runs into memory errors running this if the batch size is above around 30, so here are my implementatin ideas if I had more compute.

  1. Increase batch_size. This probably wont affect the training, but it would make testing the accuracy faster.
  2. Saving more states throughout the steps taken and adding them together. This would allow for gradients to get passed back to the original state similar to how RESNET can train very large model since the gradients can get passed backwards easier. This has been implemented to a smaller degree already and showed massive accuracy improvements.
  3. Perform some kind of evolutionary parameter search by mutating the model parameters while also using backprop. This has been shown to improve the accuracy of image classifiers and other models. But this would take a ton of compute.

Yannic Kilcher's Attention

This hass been pushed to github because during testing and tuning hyperparameters, a better model than previous was found. More testing needs to be done and I'm working on the visual explanation for it now. Previous versions of this code don't have the attention seen in the current version and will have similar performance.

Other Ideas behind the paper implementation

This is basically a neural cellular automata from the paper Growing Neural Cellular Automata with some inspiration from the follow up paper Self-classifying MNIST Digits. Except instead of a single list of numbers (or one vector) per pixel, there are several vectors per pixel in each image. The Growing Neural Cellular Automata paper was very difficult to train also because the long gradient chains, so increasing the models complexity in this GLOM paper makes training even harder. But the neural cellular automata papers are the reason why the MSE loss function is used while also adding random noise to the state during training.

To do

  1. Generated the explanation for Yannick Kilcher's version of attention that is implemented here.
  2. See if part-whole heirarchies are being found.
  3. Keep testing hyperpatameters to push accuracy higher.
  4. Test different state initializations.
  5. Train on harder datasets.

If you find any issues, please feel free to contact me

Owner
Just a random coder
The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

Yuki M. Asano 249 Dec 22, 2022
A tensorflow model that predicts if the image is of a cat or of a dog.

Quick intro Hello and thank you for your interest in my project! This is the backend part of a two-repo application. The other part can be found here

Tudor Matei 0 Mar 08, 2022
Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets (including obl

Azavea 1.7k Dec 22, 2022
The code repository for "PyCIL: A Python Toolbox for Class-Incremental Learning" in PyTorch.

PyCIL: A Python Toolbox for Class-Incremental Learning Introduction • Methods Reproduced • Reproduced Results • How To Use • License • Acknowledgement

Fu-Yun Wang 258 Dec 31, 2022
Make differentially private training of transformers easy for everyone

private-transformers This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers. What is this? Why

Xuechen Li 73 Dec 28, 2022
Generative Adversarial Networks for High Energy Physics extended to a multi-layer calorimeter simulation

CaloGAN Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks. This repository c

Deep Learning for HEP 101 Nov 13, 2022
yufan 81 Dec 08, 2022
This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation.

ISL This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation, which is accepted

19 May 04, 2022
A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

A-ESRGAN: Training Real-World Blind Super-Resolution with Attention-based U-net Discriminators The authors are hidden for the purpose of double blind

77 Dec 16, 2022
IJCAI2020 & IJCV 2020 :city_sunrise: Unsupervised Scene Adaptation with Memory Regularization in vivo

Seg_Uncertainty In this repo, we provide the code for the two papers, i.e., MRNet:Unsupervised Scene Adaptation with Memory Regularization in vivo, IJ

Zhedong Zheng 348 Jan 05, 2023
[ICLR'19] Trellis Networks for Sequence Modeling

TrellisNet for Sequence Modeling This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico

CMU Locus Lab 460 Oct 13, 2022
Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

Time Masking for Temporal Language Models This repository provides a reference implementation of the paper: Time Masking for Temporal Language Models

Guy Rosin 12 Jan 06, 2023
Exploration-Exploitation Dilemma Solving Methods

Exploration-Exploitation Dilemma Solving Methods Medium article for this repo - HERE In ths repo I implemented two techniques for tackling mentioned t

Aman Mishra 6 Jan 25, 2022
SatelliteSfM - A library for solving the satellite structure from motion problem

Satellite Structure from Motion Maintained by Kai Zhang. Overview This is a libr

Kai Zhang 190 Dec 08, 2022
This repository contains the code for designing risk bounded motion plans for car-like robot using Carla Simulator.

Nonlinear Risk Bounded Robot Motion Planning This code simulates the bicycle dynamics of car by steering it on the road by avoiding another static car

8 Sep 03, 2022
Self-Learning - Books Papers, Courses & more I have to learn soon

Self-Learning This repository is intended to be used for personal use, all rights reserved to respective owners, please cite original authors and ask

Achint Chaudhary 968 Jan 02, 2022
AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video actio

Data Analytics Lab at Texas A&M University 267 Dec 17, 2022
Stereo Hybrid Event-Frame (SHEF) Cameras for 3D Perception, IROS 2021

For academic use only. Stereo Hybrid Event-Frame (SHEF) Cameras for 3D Perception Ziwei Wang, Liyuan Pan, Yonhon Ng, Zheyu Zhuang and Robert Mahony Th

Ziwei Wang 11 Jan 04, 2023
Safe Control for Black-box Dynamical Systems via Neural Barrier Certificates

Safe Control for Black-box Dynamical Systems via Neural Barrier Certificates Installation Clone the repository: git clone https://github.com/Zengyi-Qi

Zengyi Qin 3 Oct 18, 2022