CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Related tags

Deep Learningcloob
Overview

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Andreas Fürst* 1, Elisabeth Rumetshofer* 1, Viet Tran1, Hubert Ramsauer1, Fei Tang3, Johannes Lehner1, David Kreil2, Michael Kopp2, Günter Klambauer1, Angela Bitto-Nemling1, Sepp Hochreiter1 2

1 ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
2 Institute of Advanced Research in Artificial Intelligence (IARAI)
3 HERE Technologies
* Equal contribution


Detailed blog post on this paper at this link.

The full paper is available here.


Implementation of CLOOB

This repository contains the implemenation of CLOOB used to obtain the results reported in the paper. The implementation is based on OpenCLIP, an open source implementation of OpenAI's CLIP.

Setup

We provide an 'environment.yml' file to set up a conda environment with all required packages. Run the following command to clone the repository and create the environment.

# Clone repository and swtich into the directory
git clone https://github.com/ml-jku/cloob
cd cloob

# Create the environment and activate it
conda env create --file environment.yml
conda activate cloob

# Additionally, webdataset needs to be installed from git repo for pre-training on YFCC 
pip install git+https://github.com/tmbdev/webdataset.git

# Add the directory to the PYTHONPATH environment variable
export PYTHONPATH="$PYTHONPATH:$PWD/src"

Data

For pre-training we use the two datasets supported by OpenCLIP, namely Conceptual Captions and YFCC.

Conceptual Captions

OpenCLIP already provides a script to download and prepare the Conceptual Captions dataset, which contains 2.89M training images and 13k validation images. First, download the Conceptual Captions URLs and then run the script gather_cc.py.

python3 src/data/gather_cc.py path/to/Train_GCC-training.tsv path/to/Validation_GCC-1.1.0-Validation.tsv

YFCC

We use the same subset of ~15M images from the YFCC100M dataset as CLIP. They provide a list of (line number, photo identifier, photo hash) of each image contained in this subset here.

For more information see YFCC100m Subset on OpenAI's github.

Downstream Tasks

In the paper we report results on several downstream tasks. Except for ImageNet we provide links to already pre-processed versions (where necessary) of the respective test set.

Dataset Description Official Processed
Birdsnap This dataset contains images of North American bird species, however
our dataset is smaller than reported in CLIP as some samples are no longer available.
Link Link
Country211 This dataset was published in CLIP and is a small subset of the YFCC100m dataset.
It consists of photos that can be assigned to 211 countries via GPS coordinates.
For each country 200 photos are sampled for the training set and 100 for testing.
Link Link
Flowers102 Images of 102 flower categories commonly occuring in the United Kingdom were collected.
Several classes are very similar and there is a large variation in scale, pose and lighting.
Link Link
GTSRB This dataset was released for a challenge held at the IJCNN 2011.
The dataset contains images of german traffic signs from more than 40 classes.
Link Link
Stanford Cars This dataset contains images of 196 car models at the level of make,
model and year (e.g. Tesla Model S Sedan 2012).
Link Link
UCF101 The dataset has been created by extracting the middle frame from each video. Link Link
ImageNet This dataset spans 1000 object classes and contains 1,281,167 training images,
50,000 validation images and 100,000 test images.
Link -
ImageNet v2 The ImageNetV2 dataset contains new test data for the ImageNet benchmark. Link -

Usage

In the following there is an example command for pretraining on CC with an effective batch size of 512 when used on 4 GPUs.

/conceptual_captions/Train-GCC-training_output.csv" \ --val-data=" /conceptual_captions/Validation_GCC-1.1.0-Validation_output.csv" \ --path-data=" /conceptual_captions" \ --imagenet-val=" /imagenet/val" \ --warmup 20000 \ --batch-size=128 \ --lr=1e-3 \ --wd=0.1 \ --lr-scheduler="cosine-restarts" \ --restart-cycles=10 \ --epochs=70 \ --method="cloob" \ --init-inv-tau=30 \ --init-scale-hopfield=8 \ --workers=8 \ --model="RN50" \ --dist-url="tcp://127.0.0.1:6100" \ --batch-size-eval=512 ">
python -u src/training/main.py \
--train-data="
       
        /conceptual_captions/Train-GCC-training_output.csv
        "
        \
--val-data="
       
        /conceptual_captions/Validation_GCC-1.1.0-Validation_output.csv
        "
        \
--path-data="
       
        /conceptual_captions
        "
        \
--imagenet-val="
       
        /imagenet/val
        "
        \
--warmup 20000 \
--batch-size=128 \
--lr=1e-3 \
--wd=0.1 \
--lr-scheduler="cosine-restarts" \
--restart-cycles=10 \
--epochs=70 \
--method="cloob" \
--init-inv-tau=30 \
--init-scale-hopfield=8 \
--workers=8 \
--model="RN50" \
--dist-url="tcp://127.0.0.1:6100" \
--batch-size-eval=512

Zeroshot evaluation of downstream tasks

We provide a Jupyter notebook to perform zeroshot evaluation with a trained model.

LICENSE

MIT LICENSE

Owner
Institute for Machine Learning, Johannes Kepler University Linz
Software of the Institute for Machine Learning, JKU Linz
Institute for Machine Learning, Johannes Kepler University Linz
Code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance

Semi-supervised Deep Kernel Learning This is the code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data

58 Oct 26, 2022
Fully Convolutional Networks for Semantic Segmentation by Jonathan Long*, Evan Shelhamer*, and Trevor Darrell. CVPR 2015 and PAMI 2016.

Fully Convolutional Networks for Semantic Segmentation This is the reference implementation of the models and code for the fully convolutional network

Evan Shelhamer 3.2k Jan 08, 2023
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P

16 Dec 14, 2022
auto-tuning momentum SGD optimizer

YellowFin YellowFin is an auto-tuning optimizer based on momentum SGD which requires no manual specification of learning rate and momentum. It measure

Jian Zhang 288 Nov 19, 2022
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, Angjoo Ka

Tomas Jakab 87 Nov 30, 2022
Code, final versions, and information on the Sparkfun Graphical Datasheets

Graphical Datasheets Code, final versions, and information on the SparkFun Graphical Datasheets. Generated Cells After Running Script Example Complete

SparkFun Electronics 102 Jan 05, 2023
VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

    VarCLR: Variable Representation Pre-training via Contrastive Learning New: Paper accepted by ICSE 2022. Preprint at arXiv! This repository contain

squaresLab 32 Oct 24, 2022
Code for the AI lab course 2021/2022 of the University of Verona

AI-Lab Code for the AI lab course 2021/2022 of the University of Verona Set-Up the environment for the curse Download Anaconda for your System. Instal

Davide Corsi 5 Oct 19, 2022
Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

JSON 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Json2Xml t

Nguyễn Trường Lâu 6 Aug 22, 2022
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

TimeSformer This is an official pytorch implementation of Is Space-Time Attention All You Need for Video Understanding?. In this repository, we provid

Facebook Research 1k Dec 31, 2022
Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Video Representation Learning by Recognizing Temporal Transformations [Project Page] Simon Jenni, Givi Meishvili, and Paolo Favaro. In ECCV, 2020. Thi

Simon Jenni 46 Nov 14, 2022
Details about the wide minima density hypothesis and metrics to compute width of a minima

wide-minima-density-hypothesis Details about the wide minima density hypothesis and metrics to compute width of a minima This repo presents the wide m

Nikhil Iyer 9 Dec 27, 2022
Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Şebnem 6 Jan 18, 2022
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

SSTNet Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks(ICCV2021) by Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui J

83 Nov 29, 2022
Hyperparameter tuning for humans

KerasTuner KerasTuner is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search. Easily c

Keras 2.6k Dec 27, 2022
Simple (but Strong) Baselines for POMDPs

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs Welcome to the POMDP world! This repo provides some simple baselines for POMDPs, specific

Tianwei V. Ni 172 Dec 29, 2022
[NeurIPS'21] Shape As Points: A Differentiable Poisson Solver

Shape As Points (SAP) Paper | Project Page | Short Video (6 min) | Long Video (12 min) This repository contains the implementation of the paper: Shape

394 Dec 30, 2022
MetaDrive: Composing Diverse Scenarios for Generalizable Reinforcement Learning

MetaDrive: Composing Diverse Driving Scenarios for Generalizable RL [ Documentation | Demo Video ] MetaDrive is a driving simulator with the following

DeciForce: Crossroads of Machine Perception and Autonomy 276 Jan 04, 2023
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

Codebase for learning control flow in transformers The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformer

Csordás Róbert 24 Oct 15, 2022
AutoPentest-DRL: Automated Penetration Testing Using Deep Reinforcement Learning

AutoPentest-DRL: Automated Penetration Testing Using Deep Reinforcement Learning AutoPentest-DRL is an automated penetration testing framework based o

Cyber Range Organization and Design Chair 217 Jan 01, 2023