Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

Last update: Dec 11, 2022

Related tags

Deep Learning ViP

Overview

Visual Parser (ViP)

This is the official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers.

Key Features & TLDR

PyTorch Implementation of the ViP network. Check it out at models/vip.py
A fast and neat implementation of the relative positional encoding proposed in HaloNet, BOTNet and AANet.
A transformer-friendly FLOPS & Param counter that supports FLOPS calculation for einsum and matmul operations.

Prerequisite

Please refer to get_started.md.

Results and Models

All models listed below are evaluated with input size 224x224

Model	Top1 Acc	#params	FLOPS	Download
ViP-Tiny	79.0	12.8M	1.7G	Google Drive
ViP-Small	82.1	32.1M	4.5G	Google Drive
ViP-Medium	83.3	49.6M	8.0G	Coming Soon
ViP-Base	83.6	87.8M	15.0G	Coming Soon

To load the pretrained checkpoint, e.g. ViP-Tiny, simply run:

# first download the checkpoint and name it as vip_t_dict.pth
from models.vip import vip_tiny
model = vip_tiny(pretrained="vip_t_dict.pth")

Evaluation

To evaluate a pre-trained ViP on ImageNet val, run:

python3 main.py <data-root> --model <model-name> -b <batch-size> --eval_checkpoint <path-to-checkpoint>

Training from scratch

To train a ViP on ImageNet from scratch, run:

bash ./distributed_train.sh <job-name> <config-path> <num-gpus>

For example, to train ViP with 8 GPU on a single node, run:

ViP-Tiny:

bash ./distributed_train.sh vip-t-001 configs/vip_t_bs1024.yaml 8

ViP-Small:

bash ./distributed_train.sh vip-s-001 configs/vip_s_bs1024.yaml 8

ViP-Medium:

bash ./distributed_train.sh vip-m-001 configs/vip_m_bs1024.yaml 8

ViP-Base:

bash ./distributed_train.sh vip-b-001 configs/vip_b_bs1024.yaml 8

Profiling the model

To measure the throughput, run:

python3 test_throughput.py <model-name>

For example, if you want to get the test speed of Vip-Tiny on your device, run:

python3 test_throughput.py vip-tiny

To measure the FLOPS and number of parameters, run:

python3 test_flops.py <model-name>

Citing ViP

@article{vip,
  title={Visual Parser: Representing Part-whole Hierarchies with Transformers},
  author={Sun, Shuyang and Yue, Xiaoyu, Bai, Song and Torr, Philip},
  journal={arXiv preprint arXiv:2107.05790},
  year={2021}
}

Contact

If you have any questions, don't hesitate to contact Shuyang (Kevin) Sun. You can easily reach him by sending an email to [email protected].

Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

Related tags

Overview

Visual Parser (ViP)

Key Features & TLDR

Prerequisite

Results and Models

Evaluation

Training from scratch

Profiling the model

Citing ViP

Contact

Owner

Shuyang Sun

Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling"

Sequential model-based optimization with a `scipy.optimize` interface

TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

OpenLT: An open-source project for long-tail classification

Flow is a computational framework for deep RL and control experiments for traffic microsimulation.

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

Pytorch implementation for "Open Compound Domain Adaptation" (CVPR 2020 ORAL)

Advantage Actor Critic (A2C): jax + flax implementation

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

A particular navigation route using satellite feed and can help in toll operations & traffic managemen

A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

Official PyTorch implementation of "BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation" (NeurIPS 2021)

Current state of supervised and unsupervised depth completion methods

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

maximal update parametrization (µP)

Intent parsing and slot filling in PyTorch with seq2seq + attention

Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

An example to implement a new backbone with OpenMMLab framework.

Reference implementation for Structured Prediction with Deep Value Networks