We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Related tags

Deep LearningConTNet
Overview

ConTNet

Introduction

ConTNet (Convlution-Tranformer Network) is proposed mainly in response to the following two issues: (1) ConvNets lack a large receptive field, limiting the performance of ConvNets on downstream tasks. (2) Transformer-based model is not robust enough and requires special training settings or hundreds of millions of images as the pretrain dataset, thereby limiting their adoption. ConTNet combines convolution and transformer alternately, which is very robust and can be optimized like ResNet unlike the recently-proposed transformer-based models (e.g., ViT, DeiT) that are sensitive to hyper-parameters and need many tricks when trained from scratch on a midsize dataset (e.g., ImageNet).

Main Results on ImageNet

name resolution [email protected] #params(M) FLOPs(G) model
Res-18 224x224 71.5 11.7 1.8
ConT-S 224x224 74.9 10.1 1.5
Res-50 224x224 77.1 25.6 4.0
ConT-M 224x224 77.6 19.2 3.1
Res-101 224x224 78.2 44.5 7.6
ConT-B 224x224 77.9 39.6 6.4
DeiT-Ti* 224x224 72.2 5.7 1.3
ConT-Ti* 224x224 74.9 5.8 0.8
Res-18* 224x224 73.2 11.7 1.8
ConT-S* 224x224 76.5 10.1 1.5
Res-50* 224x224 78.6 25.6 4.0
DeiT-S* 224x224 79.8 22.1 4.6
ConT-M* 224x224 80.2 19.2 3.1
Res-101* 224x224 80.0 44.5 7.6
DeiT-B* 224x224 81.8 86.6 17.6
ConT-B* 224x224 81.8 39.6 6.4

Note: * indicates training with strong augmentations.

Main Results on Downstream Tasks

Object detection results on COCO.

method backbone #params(M) FLOPs(G) AP APs APm APl
RetinaNet Res-50
ConTNet-M
32.0
27.0
235.6
217.2
36.5
37.9
20.4
23.0
40.3
40.6
48.1
50.4
FCOS Res-50
ConTNet-M
32.2
27.2
242.9
228.4
38.7
40.8
22.9
25.1
42.5
44.6
50.1
53.0
faster rcnn Res-50
ConTNet-M
41.5
36.6
241.0
225.6
37.4
40.0
21.2
25.4
41.0
43.0
48.1
52.0

Instance segmentation results on Cityscapes based on Mask-RCNN.

backbone APbb APsbb APmbb APlbb APmk APsmk APmmk APlmk
Res-50
ConT-M
38.2
40.5
21.9
25.1
40.9
44.4
49.5
52.7
34.7
38.1
18.3
20.9
37.4
41.0
47.2
50.3

Semantic segmentation results on cityscapes.

model mIOU
PSP-Res50 77.12
PSP-ConTM 78.28

Bib Citing

@article{yan2021contnet,
    title={ConTNet: Why not use convolution and transformer at the same time?},
    author={Haotian Yan and Zhe Li and Weijian Li and Changhu Wang and Ming Wu and Chuang Zhang},
    year={2021},
    journal={arXiv preprint arXiv:2104.13497}
}
This is the official repository of XVFI (eXtreme Video Frame Interpolation)

XVFI This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206 Last Update: 20210607 We provide th

Jihyong Oh 195 Dec 29, 2022
Supplemental learning materials for "Fourier Feature Networks and Neural Volume Rendering"

Fourier Feature Networks and Neural Volume Rendering This repository is a companion to a lecture given at the University of Cambridge Engineering Depa

Matthew A Johnson 133 Dec 26, 2022
ReferFormer - Official Implementation of ReferFormer

The official implementation of the paper: Language as Queries for Referring Vide

Jonas Wu 232 Dec 29, 2022
This solves the autonomous driving issue which is supported by deep learning technology. Given a video, it splits into images and predicts the angle of turning for each frame.

Self Driving Car An autonomous car (also known as a driverless car, self-driving car, and robotic car) is a vehicle that is capable of sensing its env

Sagor Saha 4 Sep 04, 2021
Spherical CNNs

Spherical CNNs Equivariant CNNs for the sphere and SO(3) implemented in PyTorch Overview This library contains a PyTorch implementation of the rotatio

Jonas Köhler 893 Dec 28, 2022
Details about the wide minima density hypothesis and metrics to compute width of a minima

wide-minima-density-hypothesis Details about the wide minima density hypothesis and metrics to compute width of a minima This repo presents the wide m

Nikhil Iyer 9 Dec 27, 2022
A user-friendly research and development tool built to standardize RL competency assessment for custom agents and environments.

Built with ❤️ by Sam Showalter Contents Overview Installation Dependencies Usage Scripts Standard Execution Environment Development Environment Benchm

SRI-AIC 1 Nov 18, 2021
This is the official implementation of VaxNeRF (Voxel-Accelearated NeRF).

VaxNeRF Paper | Google Colab This is the official implementation of VaxNeRF (Voxel-Accelearated NeRF). This codebase is implemented using JAX, buildin

naruya 132 Nov 21, 2022
UFPR-ADMR-v2 Dataset

UFPR-ADMR-v2 Dataset The UFPR-ADMRv2 dataset contains 5,000 dial meter images obtained on-site by employees of the Energy Company of Paraná (Copel), w

Gabriel Salomon 8 Sep 29, 2022
Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators

Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators. It's also a suite of learning algorithms to train agents to operate in these enviro

Google 1.5k Jan 02, 2023
Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

transformer-from-scratch Code for my Medium blog post: Transformers from Scratch in PyTorch Note: This Transformer code does not include masked attent

Frank Odom 27 Dec 21, 2022
Riemannian Geometry for Molecular Surface Approximation (RGMolSA)

Riemannian Geometry for Molecular Surface Approximation (RGMolSA) Introduction Ligand-based virtual screening aims to reduce the cost and duration of

11 Nov 15, 2022
StackNet is a computational, scalable and analytical Meta modelling framework

StackNet This repository contains StackNet Meta modelling methodology (and software) which is part of my work as a PhD Student in the computer science

Marios Michailidis 1.3k Dec 15, 2022
naked is a Python tool which allows you to strip a model and only keep what matters for making predictions.

naked is a Python tool which allows you to strip a model and only keep what matters for making predictions. The result is a pure Python function with no third-party dependencies that you can simply c

Max Halford 24 Dec 20, 2022
Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging, ICCV2021 [PyTorch Code]

Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging, ICCV2021 [PyTorch Code]

Jian Zhang 20 Oct 24, 2022
Revisiting Self-Training for Few-Shot Learning of Language Model.

SFLM This is the implementation of the paper Revisiting Self-Training for Few-Shot Learning of Language Model. SFLM is short for self-training for few

15 Nov 19, 2022
The official implementation of EIGNN: Efficient Infinite-Depth Graph Neural Networks (NeurIPS 2021)

EIGNN: Efficient Infinite-Depth Graph Neural Networks The official implementation of EIGNN: Efficient Infinite-Depth Graph Neural Networks (NeurIPS 20

Juncheng Liu 14 Nov 22, 2022
571 Dec 25, 2022
FCOS: Fully Convolutional One-Stage Object Detection (ICCV'19)

FCOS: Fully Convolutional One-Stage Object Detection This project hosts the code for implementing the FCOS algorithm for object detection, as presente

Tian Zhi 3.1k Jan 05, 2023
Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

Shortformer This repository contains the code and the final checkpoint of the Shortformer model. This file explains how to run our experiments on the

Ofir Press 138 Apr 15, 2022