ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Last update: Dec 19, 2022

Related tags

Overview

This is the project page for the paper:

ISTR: End-to-End Instance Segmentation via Transformers,
Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wang, Ke Li, Feiyue Huang, Ling Shao, Rongrong Ji,
arXiv 2105.00637

⭐ Highlights:

GPU Friendly: Four 1080Ti/2080Ti GPUs can handle the training for R50, R101 backbones with ISTR.
High Performance: On COCO test-dev, ISTR-R50-3x gets 46.8/38.6 box/mask AP, and ISTR-R101-3x gets 48.1/39.9 box/mask AP.

Updates

(2021.05.03) The project page for ISTR is avaliable.

Models

Method	inf. time	box AP	mask AP	download
ISTR-R50-3x	17.8 FPS	46.8	38.6	model \| log
ISTR-R101-3x	13.9 FPS	48.1	39.9	model \| log

The inference time is evaluated with a single 2080Ti GPU.
We use the models pre-trained on ImageNet using torchvision. The ImageNet pre-trained ResNet-101 backbone is obtained from SparseR-CNN.

Installation

The codes are built on top of Detectron2, SparseR-CNN, and AdelaiDet.

Requirements

Python=3.8
PyTorch=1.6.0, torchvision=0.7.0, cudatoolkit=10.1
OpenCV for visualization

Steps

Install the repository (we recommend to use Anaconda for installation.)

conda create -n ISTR python=3.8 -y
conda activate ISTR
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
pip install opencv-python
pip install scipy
pip install shapely
git clone https://github.com/hujiecpp/ISTR.git
cd ISTR
python setup.py build develop

Link coco dataset path

ln -s /coco_dataset_path/coco ./datasets

Train ISTR (e.g., with ResNet50 backbone)

python projects/ISTR/train_net.py --num-gpus 4 --config-file projects/ISTR/configs/ISTR-R50-3x.yaml

Evaluate ISTR (e.g., with ResNet50 backbone)

python projects/ISTR/train_net.py --num-gpus 4 --config-file projects/ISTR/configs/ISTR-R50-3x.yaml --eval-only MODEL.WEIGHTS ./output/model_final.pth

Visualize the detection and segmentation results (e.g., with ResNet50 backbone)

python demo/demo.py --config-file projects/ISTR/configs/ISTR-R50-3x.yaml --input input1.jpg --output ./output --confidence-threshold 0.4 --opts MODEL.WEIGHTS ./output/model_final.pth

Citation

If our paper helps your research, please cite it in your publications:

@article{hu2021ISTR,
  title={ISTR: End-to-End Instance Segmentation via Transformers},
  author={Hu, Jie and Cao, Liujuan and Lu, Yao and Zhang, ShengChuan and Li, Ke and Huang, Feiyue and Shao, Ling and Ji, Rongrong},
  journal={arXiv preprint arXiv:2105.00637},
  year={2021}
}

ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Related tags

Overview

Updates

Models

Installation

Requirements

Steps

Citation

Owner

Jie Hu

TorchMD-Net provides state-of-the-art graph neural networks and equivariant transformer neural networks potentials for learning molecular potentials

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Object tracking implemented with YOLOv4, DeepSort, and TensorFlow.

A simple, fully convolutional model for real-time instance segmentation.

Machine Learning Platform for Kubernetes

PyTorch module to use OpenFace's nn4.small2.v1.t7 model

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces

Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

Companion code for "Bayesian logistic regression for online recalibration and revision of risk prediction models with performance guarantees"

Mesh Graphormer is a new transformer-based method for human pose and mesh reconsruction from an input image

SelfAugment extends MoCo to include automatic unsupervised augmentation selection.

CATE: Computation-aware Neural Architecture Encoding with Transformers

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

Lazy, a tool for running things in idle time

Code for Paper "Evidential Softmax for Sparse MultimodalDistributions in Deep Generative Models"

RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting

This project uses Template Matching technique for object detecting by detection of template image over base image.

Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Code repository for "Reducing Underflow in Mixed Precision Training by Gradient Scaling" presented at IJCAI '20