SiT: Self-supervised vIsion Transformer

Last update: Dec 28, 2022

Related tags

Overview

SiT: Self-supervised vIsion Transformer

This repository contains the official PyTorch self-supervised pretraining, finetuning, and evaluation codes for SiT (Self-supervised image Transformer).

The training strategy is adopted from Deit

Usage

Create an environment

conda create -n SiT python=3.8

Activate the environment and install the necessary packages

conda activate SiT

conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch

pip install -r requirements.txt

Self-supervised pre-training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 72 --epochs 501 --min-lr 5e-6 --lr 1e-3 --training-mode 'SSL' --data-set 'STL10' --output 'checkpoints/SSL/STL10' --validate-every 10

Finetuning

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 120 --epochs 501 --min-lr 5e-6 --training-mode 'finetune' --data-set 'STL10' --finetune 'checkpoints/SSL/STL10/checkpoint.pth' --output 'checkpoints/finetune/STL10' --validate-every 10

Linear Evaluation

Linear projection Head

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 120 --epochs 501 --lr 1e-3 --weight-decay 5e-4 --min-lr 5e-6 --training-mode 'finetune' --data-set 'STL10' --finetune 'checkpoints/SSL/STL10/checkpoint.pth' --output 'checkpoints/finetune/STL10_LE' --validate-every 10 --SiT_LinearEvaluation 1

2-layer MLP projection Head

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 120 --epochs 501 --lr 1e-3 --weight-decay 5e-4 --min-lr 5e-6 --training-mode 'finetune' --data-set 'STL10' --finetune 'checkpoints/SSL/STL10/checkpoint.pth' --output 'checkpoints/finetune/STL10_LE_hidden' --validate-every 10 --SiT_LinearEvaluation 1 --representation-size 1024

Note: assign the --dataset_location parameter to the location of the downloaded dataset

If you use this code for a paper, please cite:

@article{atito2021sit,

  title={SiT: Self-supervised vIsion Transformer},

  author={Atito, Sara and Awais, Muhammad and Kittler, Josef},

  journal={arXiv preprint arXiv:2104.03602},

  year={2021}

}

License

This repository is released under the GNU General Public License.

SiT: Self-supervised vIsion Transformer

Related tags

Overview

SiT: Self-supervised vIsion Transformer

Usage

Self-supervised pre-training

Finetuning

Linear Evaluation

License

Owner

Sara Ahmed

SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]

An official TensorFlow implementation of “CLCC: Contrastive Learning for Color Constancy” accepted at CVPR 2021.

A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

PyGCL: Graph Contrastive Learning Library for PyTorch

Post-training Quantization for Neural Networks with Provable Guarantees

Optimize Trading Strategies Using Freqtrade

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Statistical and Algorithmic Investing Strategies for Everyone

Try out deep learning models online on Google Colab

The official codes for the ICCV2021 presentation "Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting"

[MICCAI'20] AlignShift: Bridging the Gap of Imaging Thickness in 3D Anisotropic Volumes

Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.

PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems

Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).

3D2Unet: 3D Deformable Unet for Low-Light Video Enhancement (PRCV2021)

Count GitHub Stars ⭐

Codes for [NeurIPS'21] You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership.

CVPR 2022 "Online Convolutional Re-parameterization"

A framework for analyzing computer vision models with simulated data