DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Last update: Dec 21, 2022

Related tags

Overview

DPT

This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and models for the following tasks:

Image Classification: Detailed instruction and information see classification/README.md.

Object Detection: Detailed instruction and information see detection/README.md.

The papar has been relased on [Arxiv].

Introduction

Deformable Patch (DePatch) is a plug-and-play module. It learns to adaptively split the images input patches with different positions and scales in a data-driven way, rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches.

In this repository, code and models for a Deformable Patch-based Transformer (DPT) are provided. As this field is developing rapidly, we are willing to see our DePatch applied to some other latest architectures and promote further research.

Main Results

Image Classification

Training commands and pretrained models are provided >>> here <<<.

Method	#Params (M)	FLOPs(G)	[email protected]
DPT-Tiny	15.2	2.1	77.4
DPT-Small	26.4	4.0	81.0
DPT-Medium	46.1	6.9	81.9

Object Detection

Coming soon.

Citation

@inproceedings{chenDPT21,
  title = {DPT: Deformable Patch-based Transformer for Visual Recognition},
  author = {Zhiyang Chen and Yousong Zhu and Chaoyang Zhao and Guosheng Hu and Wei Zeng and Jinqiao Wang and Ming Tang},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

Our implementation is mainly based on PVT. The CUDA operator is borrowed from Deformable-DETR. You may refer these repositories for further information.

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Related tags

Overview

DPT

Introduction

Main Results

Image Classification

Object Detection

Citation

License

Acknowledgement

Owner

CASIA-IVA-Lab

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

Free like Freedom

Simulating Sycamore quantum circuits classically using tensor network algorithm.

scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.

Official Pytorch implementation of Meta Internal Learning

Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Demonstrates iterative FGSM on Apple's NeuralHash model.

This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

Clean Machine Learning, a Coding Kata

Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

TransPrompt - Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric

Code and description for my BSc Project, September 2021

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

Convert Apple NeuralHash model for CSAM Detection to ONNX.

Which Style Makes Me Attractive? Interpretable Control Discovery and Counterfactual Explanation on StyleGAN