Deep ViT Features as Dense Visual Descriptors

Overview

dino-vit-features

[paper] [project page]

Official implementation of the paper "Deep ViT Features as Dense Visual Descriptors".

teaser

We demonstrate the effectiveness of deep features extracted from a self-supervised, pre-trained ViT model (DINO-ViT) as dense patch descriptors via real-world vision tasks: (a-b) co-segmentation & part co-segmentation: given a set of input images (e.g., 4 input images), we automatically co-segment semantically common foreground objects (e.g., animals), and then further partition them into common parts; (c-d) point correspondence: given a pair of input images, we automatically extract a sparse set of corresponding points. We tackle these tasks by applying only lightweight, simple methodologies such as clustering or binning, to deep ViT features.

Setup

Our code is developed in pytorch on and requires the following modules: tqdm, faiss, timm, matplotlib, pydensecrf, opencv, scikit-learn. We use python=3.9 but our code should be runnable on any version above 3.6. We recomment running our code with any CUDA supported GPU for faster performance. We recommend setting the running environment via Anaconda by running the following commands:

$ conda env create -f env/dino-vit-feats-env.yml
$ conda activate dino-vit-feats-env

Otherwise, run the following commands in your conda environment:

$ conda install pytorch torchvision torchaudio cudatoolkit=11 -c pytorch
$ conda install tqdm
$ conda install -c conda-forge faiss
$ conda install -c conda-forge timm 
$ conda install matplotlib
$ pip install opencv-python
$ pip install git+https://github.com/lucasb-eyer/pydensecrf.git
$ conda install -c anaconda scikit-learn

ViT Extractor

We provide a wrapper class for a ViT model to extract dense visual descriptors in extractor.py. You can extract descriptors to .pt files using the following command:

python extractor.py --image_path 
   
     --output_path 
    

    
   

You can specify the pretrained model using the --model flag with the following options:

  • dino_vits8, dino_vits16, dino_vitb8, dino_vitb16 from the DINO repo.
  • vit_small_patch8_224, vit_small_patch16_224, vit_base_patch8_224, vit_base_patch16_224 from the timm repo.

You can specify the stride of patch extracting layer to increase resolution using the --stride flag.

Part Co-segmentation Open In Colab

We provide a notebook for running on a single example in part_cosegmentation.ipynb.

To run on several image sets, arrange each set in a directory, inside a data root directory:


   
    
|
|_ 
    
     
|  |
|  |_ img1.png
|  |_ img2.png
|   
|_ 
     
      
   |
   |_ img1.png
   |_ img2.png
   |_ img3.png
...

     
    
   

The following command will produce results in the specified :

python part_cosegmentation.py --root_dir 
   
     --save_dir 
    

    
   

Note: The default configuration in part_cosegmentation.ipynb is suited for running on small sets (e.g. < 10). Increase amount of num_crop_augmentations for more stable results (and increased runtime). The default configuration in part_cosegmentation.py is suited for larger sets (e.g. >> 10).

Co-segmentation Open In Colab

We provide a notebook for running on a single example in cosegmentation.ipynb.

To run on several image sets, arrange each set in a directory, inside a data root directory:


   
    
|
|_ 
    
     
|  |
|  |_ img1.png
|  |_ img2.png
|   
|_ 
     
      
   |
   |_ img1.png
   |_ img2.png
   |_ img3.png
...

     
    
   

The following command will produce results in the specified :

python cosegmentation.py --root_dir 
   
     --save_dir 
    

    
   

Point Correspondences Open In Colab

We provide a notebook for running on a single example in correpondences.ipynb.

To run on several image pairs, arrange each image pair in a directory, inside a data root directory:


   
    
|
|_ 
    
     
|  |
|  |_ img1.png
|  |_ img2.png
|   
|_ 
     
      
   |
   |_ img1.png
   |_ img2.png
...

     
    
   

The following command will produce results in the specified :

python correspondences.py --root_dir 
   
     --save_dir 
    

    
   

Citation

If you found this repository useful please consider starring and citing :

@article{amir2021deep,
    author    = {Shir Amir and Yossi Gandelsman and Shai Bagon and Tali Dekel},
    title     = {Deep ViT Features as Dense Visual Descriptors},
    journal   = {arXiv preprint arXiv:2112.05814},
    year      = {2021}
}
Owner
Shir Amir
Graduate Student @ Weizmann Institute of Science
Shir Amir
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval project page | arXiv | webvid-data Repository containing the code,

225 Dec 25, 2022
This repository is all about spending some time the with the original problem posed by Minsky and Papert

This repository is all about spending some time the with the original problem posed by Minsky and Papert. Working through this problem is a great way to begin learning computer vision.

Jaissruti Nanthakumar 1 Jan 23, 2022
Franka Emika Panda manipulator kinematics&dynamics simulation

pybullet_sim_panda Pybullet simulation environment for Franka Emika Panda Dependency pybullet, numpy, spatial_math_mini Simple example (please check s

0 Jan 20, 2022
Explaining Hyperparameter Optimization via PDPs

Explaining Hyperparameter Optimization via PDPs This repository gives access to an implementation of the methods presented in the paper submission “Ex

2 Nov 16, 2022
Voice of Pajlada with model and weights.

Pajlada TTS Stripped down version of ForwardTacotron (https://github.com/as-ideas/ForwardTacotron) with pretrained weights for Pajlada's (https://gith

6 Sep 03, 2021
Breast cancer is been classified into benign tumour and malignant tumour.

Breast cancer is been classified into benign tumour and malignant tumour. Logistic regression is applied in this model.

1 Feb 04, 2022
SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling Reference Main paper to be cited (Di Wu et al., 2020) @article

Moore 34 Nov 03, 2022
Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Seulki Park 70 Jan 03, 2023
TeachMyAgent is a testbed platform for Automatic Curriculum Learning methods in Deep RL.

TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL Paper Website Documentation TeachMyAgent is a testbed platform for Automatic Cu

Flowers Team 51 Dec 25, 2022
[ICCV2021] Official Pytorch implementation for SDGZSL (Semantics Disentangling for Generalized Zero-Shot Learning)

Semantics Disentangling for Generalized Zero-shot Learning This is the official implementation for paper Zhi Chen, Yadan Luo, Ruihong Qiu, Zi Huang, J

25 Dec 06, 2022
An abstraction layer for mathematical optimization solvers.

MathOptInterface Documentation Build Status Social An abstraction layer for mathematical optimization solvers. Replaces MathProgBase. Citing MathOptIn

JuMP-dev 284 Jan 04, 2023
NeuPy is a Tensorflow based python library for prototyping and building neural networks

NeuPy v0.8.2 NeuPy is a python library for prototyping and building neural networks. NeuPy uses Tensorflow as a computational backend for deep learnin

Yurii Shevchuk 729 Jan 03, 2023
An AI made using artificial intelligence (AI) and machine learning algorithms (ML) .

DTech.AIML An AI made using artificial intelligence (AI) and machine learning algorithms (ML) . This is created by help of some members in my team and

1 Jan 06, 2022
Spectralformer: Rethinking hyperspectral image classification with transformers

Spectralformer: Rethinking hyperspectral image classification with transformers Danfeng Hong, Zhu Han, Jing Yao, Lianru Gao, Bing Zhang, Antonio Plaza

Danfeng Hong 102 Dec 29, 2022
Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper

TransGanFormer (wip) Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GansFormer and TransGan paper. I

Phil Wang 146 Dec 06, 2022
CoaT: Co-Scale Conv-Attentional Image Transformers

CoaT: Co-Scale Conv-Attentional Image Transformers Introduction This repository contains the official code and pretrained models for CoaT: Co-Scale Co

mlpc-ucsd 191 Dec 03, 2022
Vision-Language Pre-training for Image Captioning and Question Answering

VLP This repo hosts the source code for our AAAI2020 work Vision-Language Pre-training (VLP). We have released the pre-trained model on Conceptual Cap

Luowei Zhou 373 Jan 03, 2023
3D position tracking for soccer players with multi-camera videos

This repo contains a full pipeline to support 3D position tracking of soccer players, with multi-view calibrated moving/fixed video sequences as inputs.

Yuchang Jiang 72 Dec 27, 2022
[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation Getting Started Our codes are implemented and tested with pyth

ZiNiU WaN 176 Dec 15, 2022
Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

High-Performance Brain-to-Text Communication via Handwriting Overview This repo is associated with this manuscript, preprint and dataset. The code can

Francis R. Willett 306 Jan 03, 2023