Code for Efficient Visual Pretraining with Contrastive Detection

Related tags

Deep Learningdetcon
Overview

Code for DetCon

This repository contains code for the ICCV 2021 paper "Efficient Visual Pretraining with Contrastive Detection" by Olivier J. Hénaff, Skanda Koppula, Jean-Baptiste Alayrac, Aaron van den Oord, Oriol Vinyals, João Carreira.

This repository includes sample code to run pretraining with DetCon. In particular, we're providing a sample script for generating the Felzenzwalb segmentations for ImageNet images (using skimage) and a pre-training experiment setup (dataloader, augmentation pipeline, optimization config, and loss definition) that describes the DetCon-B(YOL) model described in the paper. The original code uses a large grid of TPUs and internal infrastructure for training, but we've extracted the key DetCon loss+experiment in this folder so that external groups can have a reference should they want to explore a similar approaches.

This repository builds heavily from the BYOL open source release, so speed-up tricks and features in that setup may likely translate to the code here.

Running this code

Running ./setup.sh will create and activate a virtualenv and install all necessary dependencies. To enter the environment after running setup.sh, run source /tmp/detcon_venv/bin/activate.

Running bash test.sh will run a single training step on a mock image/Felzenszwalb mask as a simple validation that all dependencies are set up correctly and the DetCon pre-training can run smoothly. On our 16-core machine, running on CPU, we find this takes around 3-4 minutes.

A TFRecord dataset containing each ImageNet image, label, and its corresponding Felzenszwalb segmentation/mask can then be generated using the generate_fh_masks Python script. You will first have to download two pieces of ImageNet metadata into the same directory as the script:

wget https://raw.githubusercontent.com/tensorflow/models/master/research/slim/datasets/imagenet_metadata.txt wget https://raw.githubusercontent.com/tensorflow/models/master/research/slim/datasets/imagenet_lsvrc_2015_synsets.txt

And to run the multi-threaded mask generation script:

python generate_fh_masks_for_imagenet.py -- \
--train_directory=imagenet-train \
--output_directory=imagenet-train-fh

This single-machine, multi-threaded version of the mask generation script takes 2-3 days on a 16-core CPU machine to complete CPU-based processing of the ImageNet training and validation set. The script assumes the same ImageNet directory structure as github.com/tensorflow/models/blob/master/research/slim/datasets/build_imagenet_data.py (more details in the link).

You can then run the main training loop and execute multiple DetCon-B training steps by running from the parent directory the command:

python -m detcon.main_loop \
  --dataset_directory='/tmp/imagenet-fh-train' \
  --pretrain_epochs=100`

Note that you will need to update the dataset_directory flag, to point to the generated Felzenzwalb/image TFRecord dataset previously generated. Additionally, to use accelerators, users will need to install the correct version of jaxlib with CUDA support.

Citing this work

If you use this code in your work, please consider referencing our work:

@article{henaff2021efficient,
  title={{Efficient Visual Pretraining with Contrastive Detection}},
  author={H{\'e}naff, Olivier J and Koppula, Skanda and Alayrac, Jean-Baptiste and Oord, Aaron van den and Vinyals, Oriol and Carreira, Jo{\~a}o},
  journal={International Conference on Computer Vision},
  year={2021}
}

Disclaimer

This is not an officially supported Google product.

Owner
DeepMind
DeepMind
Rendering color and depth images for ShapeNet models.

Color & Depth Renderer for ShapeNet This library includes the tools for rendering multi-view color and depth images of ShapeNet models. Physically bas

Yinyu Nie 41 Dec 19, 2022
Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance

Nested Graph Neural Networks About Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance.

Muhan Zhang 38 Jan 05, 2023
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

Jiayi Weng 110 Dec 27, 2022
Learning to Simulate Dynamic Environments with GameGAN (CVPR 2020)

Learning to Simulate Dynamic Environments with GameGAN PyTorch code for GameGAN Learning to Simulate Dynamic Environments with GameGAN Seung Wook Kim,

199 Dec 26, 2022
Visualizing Yolov5's layers using GradCam

YOLO-V5 GRADCAM I constantly desired to know to which part of an object the object-detection models pay more attention. So I searched for it, but I di

Pooya Mohammadi Kazaj 200 Jan 01, 2023
Dieser Scanner findet Websites, die nicht direkt in Suchmaschinen auftauchen, aber trotzdem erreichbar sind.

Deep Web Scanner Dieses Script findet Websites, die per IPv4-Adresse erreichbar sind und speichert deren Metadaten. Die Ausgabe im Terminal wird nach

Alex K. 30 Nov 18, 2022
kullanışlı ve işinizi kolaylaştıracak bir araç

Hey merhaba! işte çok sorulan sorularının cevabı ve sorunlarının çözümü; Soru= İçinde var denilen birçok şeyi göremiyorum bunun sebebi nedir? Cevap= B

Sexettin 16 Dec 17, 2022
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

5 Jan 04, 2023
Language model Prompt And Query Archive

LPAQA: Language model Prompt And Query Archive This repository contains data and code for the paper How Can We Know What Language Models Know? Install

127 Dec 20, 2022
Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

LUPerson-NL Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL) The repository is for our CVPR2022 paper Large-Scale

43 Dec 26, 2022
"Inductive Entity Representations from Text via Link Prediction" @ The Web Conference 2021

Inductive entity representations from text via link prediction This repository contains the code used for the experiments in the paper "Inductive enti

Daniel Daza 45 Jan 09, 2023
PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS.

PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS. With Live, you can build a working mobile app ML demo in minutes.

559 Jan 01, 2023
Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

Vittorio Mazzia 203 Jan 08, 2023
Data and analysis code for an MS on SK VOC genomes phenotyping/neutralisation assays

Description Summary of phylogenomic methods and analyses used in "Immunogenicity of convalescent and vaccinated sera against clinical isolates of ance

Finlay Maguire 1 Jan 06, 2022
Predicting 10 different clothing types using Xception pre-trained model.

Predicting-Clothing-Types Predicting 10 different clothing types using Xception pre-trained model from Keras library. It is reimplemented version from

AbdAssalam Ahmad 3 Dec 29, 2021
Towards Part-Based Understanding of RGB-D Scans

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021) We propose the task of part-based scene understanding of real-world 3D environments: from

26 Nov 23, 2022
🔊 Audio and fastai v2

Fastaudio An audio module for fastai v2. We want to help you build audio machine learning applications while minimizing the need for audio domain expe

152 Dec 28, 2022
This repository is dedicated to developing and maintaining code for experiments with wide neural networks.

Wide-Networks This repository contains the code of various experiments on wide neural networks. In particular, we implement classes for abc-parameteri

Karl Hajjar 0 Nov 02, 2021
LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation Table of Contents: Introduction Project Structure Installation Datas

Yu Wang 492 Dec 02, 2022
Facial expression detector

A tensorflow convolutional neural network model to detect facial expressions.

Carlos Tardón Rubio 5 Apr 20, 2022