Pytorch Implementation of Residual Vision Transformers(ResViT)

Last update: Dec 08, 2022

Related tags

Overview

ResViT

Official Pytorch Implementation of Residual Vision Transformers(ResViT) which is described in the following paper:

Onat Dalmaz and Mahmut Yurt and Tolga Çukur ResViT: Residual vision transformers for multi-modal medical image synthesis. arXiv. 2021.

Dependencies

python>=3.6.9
torch>=1.7.1
torchvision>=0.8.2
visdom
dominate
cuda=>11.2

Installation

Clone this repo:

git clone https://github.com/icon-lab/ResViT
cd ResViT

Download pre-trained ViT models from Google

Pre-trained ViT models:

wget https://storage.googleapis.com/vit_models/imagenet21k/R50-ViT-B_16.npz &&
mkdir ../model/vit_checkpoint/imagenet21k &&
mv {MODEL_NAME}.npz ../model/vit_checkpoint/imagenet21k/R50-ViT-B_16.npz

Dataset

You should structure your aligned dataset in the following way:

/Datasets/BRATS/
  ├── T1_T2
  ├── T2_FLAIR
  .
  .
  ├── T1_FLAIR_T2

/Datasets/BRATS/T2__FLAIR/
  ├── train
  ├── val  
  ├── test

Note that for many-to-one tasks, source modalities should be in the Red and Green channels. (For 2 input modalities)

Pre-training of ART blocks without the presence of transformers

For many-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2__PD/ --name T1_T2_PD_IXI_pre_trained --gpu_ids 0 --model resvit_many --which_model_netG res_cnn --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 3 --loadSize 256 --fineSize 256 --niter 50 --niter_decay 50 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0

For one-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2/ --name T1_T2_IXI_pre_trained --gpu_ids 0 --model resvit_one --which_model_netG res_cnn --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 1 --loadSize 256 --fineSize 256 --niter 50 --niter_decay 50 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0

Fine tune ResViT

For many-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2__PD/ --name T1_T2_PD_IXI_resvit --gpu_ids 0 --model resvit_many --which_model_netG resvit --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 3 --loadSize 256 --fineSize 256 --niter 25 --niter_decay 25 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0 --pre_trained_transformer 1 --pre_trained_resnet 1 --pre_trained_path checkpoints/T1_T2_PD_IXI_pre_trained/latest_net_G.pth --lr 0.001

For one-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2/ --name T1_T2_IXI_resvit --gpu_ids 0 --model resvit_one --which_model_netG resvit --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 1 --loadSize 256 --fineSize 256 --niter 25 --niter_decay 25 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0 --pre_trained_transformer 1 --pre_trained_resnet 1 --pre_trained_path checkpoints/T1_T2_IXI_pre_trained/latest_net_G.pth --lr 0.001

Testing

For many-to-one tasks:
python3 test.py --dataroot Datasets/IXI/T1_T2__PD/ --name T1_T2_PD_IXI_resvit --gpu_ids 0 --model resvit_many --which_model_netG resvit --dataset_mode aligned --norm batch --phase test --output_nc 1 --input_nc 3 --how_many 10000 --serial_batches --fineSize 256 --loadSize 256 --results_dir results/ --checkpoints_dir checkpoints/ --which_epoch latest

For one-to-one tasks:
python3 test.py --dataroot Datasets/IXI/T1_T2/ --name T1_T2_IXI_resvit --gpu_ids 0 --model resvit_one --which_model_netG resvit --dataset_mode aligned --norm batch --phase test --output_nc 1 --input_nc 1 --how_many 10000 --serial_batches --fineSize 256 --loadSize 256 --results_dir results/ --checkpoints_dir checkpoints/ --which_epoch latest

Citation

You are encouraged to modify/distribute this code. However, please acknowledge this code and cite the paper appropriately.

@misc{dalmaz2021resvit,
      title={ResViT: Residual vision transformers for multi-modal medical image synthesis}, 
      author={Onat Dalmaz and Mahmut Yurt and Tolga Çukur},
      year={2021},
      eprint={2106.16031},
      archivePrefix={arXiv},
      primaryClass={eess.IV}
}

For any questions, comments and contributions, please contact Onat Dalmaz (onat[at]ee.bilkent.edu.tr)

Acknowledgments

This code uses libraries from pGAN and pix2pix repository.

Pytorch Implementation of Residual Vision Transformers(ResViT)

Related tags

Overview

ResViT

Dependencies

Installation

Download pre-trained ViT models from Google

Dataset

Pre-training of ART blocks without the presence of transformers

Fine tune ResViT

Testing

Citation

Acknowledgments

Owner

ICON Lab

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

[ACMMM 2021, Oral] Code release for "Elastic Tactile Simulation Towards Tactile-Visual Perception"

Prefix-Tuning: Optimizing Continuous Prompts for Generation

IndoNLI: A Natural Language Inference Dataset for Indonesian

Twin-deep neural network for semi-supervised learning of materials properties

End-to-end machine learning project for rices detection

World Models with TensorFlow 2

基于Pytorch实现优秀的自然图像分割框架！(包括FCN、U-Net和Deeplab)

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Implementation of popular SOTA self-supervised learning algorithms as Fastai Callbacks.

Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders

Semi-automated OpenVINO benchmark_app with variable parameters

Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

R-Drop: Regularized Dropout for Neural Networks

This project is the PyTorch implementation of our CVPR 2022 paper:

A Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery

Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

Caffe-like explicit model constructor. C(onfig)Model

Pytorch Implementation of Residual Vision Transformers(ResViT)

Related tags

Overview

ResViT

Dependencies

Installation

Download pre-trained ViT models from Google

Dataset

Pre-training of ART blocks without the presence of transformers

Fine tune ResViT

Testing

Citation

Acknowledgments

Owner

ICON Lab

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

[ACMMM 2021, Oral] Code release for "Elastic Tactile Simulation Towards Tactile-Visual Perception"

Prefix-Tuning: Optimizing Continuous Prompts for Generation

IndoNLI: A Natural Language Inference Dataset for Indonesian

Twin-deep neural network for semi-supervised learning of materials properties

End-to-end machine learning project for rices detection

World Models with TensorFlow 2

基于Pytorch实现优秀的自然图像分割框架！(包括FCN、U-Net和Deeplab)

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Implementation of popular SOTA self-supervised learning algorithms as Fastai Callbacks.

Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders

Semi-automated OpenVINO benchmark_app with variable parameters

Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

R-Drop: Regularized Dropout for Neural Networks

This project is the PyTorch implementation of our CVPR 2022 paper:

A Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery

Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

Caffe-like explicit model constructor. C(onfig)Model

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.