This is an early in-development version of training CLIP models with hivemind.

Overview

A transformer that does not hog your GPU memory

This is an early in-development codebase: if you want a stable and documented hivemind codebase, look at CALM or dalle-hivemind.

Readme under construction

LeanTransformer implements a specific version of transformer with two goals in mind:

  • using as little GPU memory as possible
  • stable training for very large models

The core philosophy of LeanTransformer is to replace torch.autograd with grad students. Automatic differentiation is great if you want to test ideas quickly, less so if a single training run can cost over $4 million (or >1000 years in grad school).

Related work: GSO

Our implementation partially replaces automatic differentiation with Grad Student Optimization (GSO) - a biologically inspired black box optimization algorithm. In the past, GSO has seen widespread adoption thanks to its strong theoretical foundations and unparalleled cost efficiency (Chom et al). Previous successfully applied GSO for hyperparameter tuning and natural language generation. To the best of our knowledge we are the first work to successfully apply distributed fault-tolerant GSO for optimizing the memory footprint of transformers. We summarize our findings below:

Memory saving features:

Other features:

Not implemented:

  • In reversible mode, one can further save memory by computing backward in chunks:
    • a few tokens at a time for feedforward layers, since grad(concat(mlp(x1), mlp(x2))) = concat(grad(mlp(x1)), grad(mlp(x2)))
    • a few heads at a time for self-attention, since grad(head1 + head2) = grad(head1) + grad(head2), where head1 and head2 are attention outputs after linear projection
  • Attention could be computed in O(sqrt(n)) memory (Rabe et al, 2021)
  • No sparse or linear attention: they are great for very long sequences. However, for large models, attention is not a bottleneck in typical NLP and vision tasks (tested gpt-3 up to length 4096).
  • Per-block grad scaling as described in (Ramesh et al, 2021) - we rely on Sandwich Norm to maintain stability up to 96 layers (did not test more). However, it would be nice to have per-block scaling to avoid the need for an extra LayerNorm.
  • Something else that we missed - please find us on discord.

A day will come a day when we explain all these modifications and provide instructions on how to tune them. But it is not this day!. Until then, we'll happily answer any questions on our discord.

Running the code

[under constructuion] - use the instructions from CALM readme

Acknowledgements:

  • Most of the architecture and stability optimizations were learned through the BigScience research workshop
  • YSDA community helped us survive through the early messy versions of this code
  • NeuroPark trained the first practical model (SahajBERT-XL, SoTA in bengali, details here)
  • TODO DALLE community: at least mention the demo, maybe we end up training something even cooler
  • TODO NCAI community: ask them how best to acknowledge them
  • TODO Hugging Face: ask them how best to acknowledge them
  • TODO Personal: stas00, samyam, jared, more? (this does not include co-authors: Tim,Lucile,Quentin,Denis,Gennady,etc; also, this does not include hivemind contributors)
Owner
<a href=[email protected]">
scalingscattering

Scaling The Scattering Transform : Deep Hybrid Networks This repository contains the experiments found in the paper: https://arxiv.org/abs/1703.08961

Edouard Oyallon 78 Dec 21, 2022
Open source hardware and software platform to build a small scale self driving car.

Donkeycar is minimalist and modular self driving library for Python. It is developed for hobbyists and students with a focus on allowing fast experimentation and easy community contributions.

Autorope 2.4k Jan 04, 2023
This is just a funny project that we want to see AutoEncoder (AE) can actually work to enhance the features we want

Funny_muscle_enhancer :) 1.Discription: This is just a funny project that we want to see AutoEncoder (AE) can actually work on the some features. We w

Jing-Yao Chen (Jacob) 8 Oct 01, 2022
Keras-retinanet - Keras implementation of RetinaNet object detection.

Keras RetinaNet Keras implementation of RetinaNet object detection as described in Focal Loss for Dense Object Detection by Tsung-Yi Lin, Priya Goyal,

Fizyr 4.3k Jan 01, 2023
Predictive Modeling on Electronic Health Records(EHR) using Pytorch

Predictive Modeling on Electronic Health Records(EHR) using Pytorch Overview Although there are plenty of repos on vision and NLP models, there are ve

81 Jan 01, 2023
A library for low-memory inferencing in PyTorch.

Pylomin Pylomin (PYtorch LOw-Memory INference) is a library for low-memory inferencing in PyTorch. Installation ... Usage For example, the following c

3 Oct 26, 2022
The modify PyTorch version of Siam-trackers which are speed-up by TensorRT.

SiamTracker-with-TensorRT The modify PyTorch version of Siam-trackers which are speed-up by TensorRT or ONNX. [Updating...] Examples demonstrating how

9 Dec 13, 2022
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 121 Dec 25, 2022
Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation

Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation This repository contains code and data f

Zoey Liu 0 Jan 07, 2022
[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods Large Scale Learning on Non-Homophilous Graphs: New Benchmark

60 Jan 03, 2023
Colab notebook for openai/glide-text2im.

GLIDE text2im on Colab This repository provides a Colab notebook to produce images conditioned on text prompts with GLIDE [1]. Usage Run text2im.ipynb

Wok 19 Oct 19, 2022
Deep Learning for Morphological Profiling

Deep Learning for Morphological Profiling An end-to-end implementation of a ML System for morphological profiling using self-supervised learning to di

Danielh Carranza 0 Jan 20, 2022
Reverse engineer your pytorch vision models, in style

🔍 Rover Reverse engineer your CNNs, in style Rover will help you break down your CNN and visualize the features from within the model. No need to wri

Mayukh Deb 32 Sep 24, 2022
Implementation of ECCV20 paper: the devil is in classification: a simple framework for long-tail object detection and instance segmentation

Implementation of our ECCV 2020 paper The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation This repo contains code o

twang 98 Sep 17, 2022
A Python package to create, run, and post-process MODFLOW-based models.

Version 3.3.5 — release candidate Introduction FloPy includes support for MODFLOW 6, MODFLOW-2005, MODFLOW-NWT, MODFLOW-USG, and MODFLOW-2000. Other s

388 Nov 29, 2022
A toy project using OpenCV and PyMunk

A toy project using OpenCV, PyMunk and Mediapipe the source code for my LindkedIn post It's just a toy project and I didn't write a documentation yet,

Amirabbas Asadi 82 Oct 28, 2022
Make your AirPlay devices as TTS speakers

Apple AirPlayer Home Assistant integration component, make your AirPlay devices as TTS speakers. Before Use 2021.6.X or earlier Apple Airplayer compon

George Zhao 117 Dec 15, 2022
Generating Digital Painting Lighting Effects via RGB-space Geometry (SIGGRAPH2020/TOG2020)

Project PaintingLight PaintingLight is a project conducted by the Style2Paints team, aimed at finding a method to manipulate the illumination in digit

651 Dec 29, 2022
A GridMixup augmentation, inspired by GridMask and CutMix

GridMixup A GridMixup augmentation, inspired by GridMask and CutMix Easy install pip install git+https://github.com/IlyaDobrynin/GridMixup.git Overvie

IlyaDo 42 Dec 28, 2022
A basic reminder tool written in Python.

A simple Python Reminder Here's a basic reminder tool written in Python that speaks to the user and sends a notification. Run pip3 install pyttsx3 w

Sachit Yadav 4 Feb 05, 2022