DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Overview

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Created by Yongming Rao*, Wenliang Zhao*, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie Zhou, Jiwen Lu,

This repository contains PyTorch implementation for DenseCLIP.

DenseCLIP is a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP. Specifically, we convert the original image-text matching problem in CLIP to a pixel-text matching problem and use the pixel-text score maps to guide the learning of dense prediction models. By further using the contextual information from the image to prompt the language model, we are able to facilitate our model to better exploit the pre-trained knowledge. Our method is model-agnostic, which can be applied to arbitrary dense prediction systems and various pre-trained visual backbones including both CLIP models and ImageNet pre-trained models.

intro

Our code is based on mmsegmentation and mmdetection and timm.

[Project Page] [arXiv]

Usage

Requirements

  • torch>=1.8.0
  • torchvision
  • timm
  • mmcv-full==1.3.17
  • mmseg==0.19.0
  • mmdet==2.17.0
  • fvcore

To use our code, please first install the mmcv-full and mmseg/mmdet following the official guidelines (mmseg, mmdet) and prepare the datasets accordingly.

Pre-trained CLIP Models

Download the pre-trained CLIP models (RN50.pt, RN101.pt, VIT-B-16.pt) and save them to the pretrained folder.

Segmentation

Model Zoo

We provide DenseCLIP models for Semantic FPN framework.

Model FLOPs (G) Params (M) mIoU(SS) mIoU(MS) config url
RN50-CLIP 248.8 31.0 36.9 43.5 config -
RN50-DenseCLIP 269.2 50.3 43.5 44.7 config Tsinghua Cloud
RN101-CLIP 326.6 50.0 42.7 44.3 config -
RN101-DenseCLIP 346.3 67.8 45.1 46.5 config Tsinghua Cloud
ViT-B-CLIP 1037.4 100.8 49.4 50.3 config -
ViT-B-DenseCLIP 1043.1 105.3 50.6 51.3 config Tsinghua Cloud

Training & Evaluation on ADE20K

To train the DenseCLIP model based on CLIP ResNet-50, run:

bash dist_train.sh configs/denseclip_fpn_res50_512x512_80k.py 8

To evaluate the performance with multi-scale testing, run:

bash dist_test.sh configs/denseclip_fpn_res50_512x512_80k.py /path/to/checkpoint 8 --eval mIoU --aug-test

To better measure the complexity of the models, we provide a tool based on fvcore to accurately compute the FLOPs of torch.einsum and other operations:

python get_flops.py /path/to/config --fvcore

You can also remove the --fvcore flag to obtain the FLOPs measured by mmcv for comparisons.

Detection

Model Zoo

We provide models for both RetinaNet and Mask-RCNN framework.

RetinaNet
Model FLOPs (G) Params (M) box AP config url
RN50-CLIP 265 38 36.9 config -
RN50-DenseCLIP 285 60 37.8 config Tsinghua Cloud
RN101-CLIP 341 57 40.5 config -
RN101-DenseCLIP 360 78 41.1 config Tsinghua Cloud
Mask R-CNN
Model FLOPs (G) Params (M) box AP mask AP config url
RN50-CLIP 301 44 39.3 36.8 config -
RN50-DenseCLIP 327 67 40.2 37.6 config Tsinghua Cloud
RN101-CLIP 377 63 42.2 38.9 config -
RN101-DenseCLIP 399 84 42.6 39.6 config Tsinghua Cloud

Training & Evaluation on COCO

To train our DenseCLIP-RN50 using RetinaNet framework, run

 bash dist_train.sh configs/retinanet_denseclip_r50_fpn_1x_coco.py 8

To evaluate the box AP of RN50-DenseCLIP (RetinaNet), run

bash dist_test.sh configs/retinanet_denseclip_r50_fpn_1x_coco.py /path/to/checkpoint 8 --eval bbox

To evaluate both the box AP and the mask AP of RN50-DenseCLIP (Mask-RCNN), run

bash dist_test.sh configs/mask_rcnn_denseclip_r50_fpn_1x_coco.py /path/to/checkpoint 8 --eval bbox segm

License

MIT License

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{rao2021denseclip,
  title={DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting},
  author={Rao, Yongming and Zhao, Wenliang and Chen, Guangyi and Tang, Yansong and Zhu, Zheng and Huang, Guan and Zhou, Jie and Lu, Jiwen},
  journal={arXiv preprint arXiv:2112.01518},
  year={2021}
}
Owner
Yongming Rao
Yongming Rao
Live training loss plot in Jupyter Notebook for Keras, PyTorch and others

livelossplot Don't train deep learning models blindfolded! Be impatient and look at each epoch of your training! (RECENT CHANGES, EXAMPLES IN COLAB, A

Piotr Migdał 1.2k Jan 08, 2023
Distance correlation and related E-statistics in Python

dcor dcor: distance correlation and related E-statistics in Python. E-statistics are functions of distances between statistical observations in metric

Carlos Ramos Carreño 108 Dec 27, 2022
transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

transfer_adv CVPR-2021 AIC-VI: unrestricted Adversarial Attacks on ImageNet CVPR2021 安全AI挑战者计划第六期赛道2:ImageNet无限制对抗攻击 介绍 : 深度神经网络已经在各种视觉识别问题上取得了最先进的性能。

25 Dec 08, 2022
Speed-Test - You can check your intenet speed using this tool

Speed-Test Tool By Hez_X AVAILABLE ON : Termux & Kali linux & Ubuntu (Linux E

Hez-X 3 Feb 17, 2022
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

RobustNet (CVPR 2021 Oral): Official Project Webpage Codes and pretrained models will be released soon. This repository provides the official PyTorch

Sungha Choi 173 Dec 21, 2022
NDE: Climate Modeling with Neural Diffusion Equation, ICDM'21

Climate Modeling with Neural Diffusion Equation Introduction This is the repository of our accepted ICDM 2021 paper "Climate Modeling with Neural Diff

Jeehyun Hwang 5 Dec 18, 2022
Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Manga Filling with ScreenVAE SIGGRAPH ASIA 2020 | Project Website | BibTex This repository is for ScreenVAE introduced in the following paper "Manga F

30 Dec 24, 2022
PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

Alibaba Cloud 56 Dec 12, 2022
This is a code repository for the paper "Graph Auto-Encoders for Financial Clustering".

Repository for the paper "Graph Auto-Encoders for Financial Clustering" Requirements Python 3.6 torch torch_geometric Instructions This is a simple c

Edward Turner 1 Dec 02, 2021
A stable algorithm for GAN training

DRAGAN (Deep Regret Analytic Generative Adversarial Networks) Link to our paper - https://arxiv.org/abs/1705.07215 Pytorch implementation (thanks!) -

195 Oct 10, 2022
Human motion synthesis using Unity3D

Human motion synthesis using Unity3D Prerequisite: Software: amc2bvh.exe, Unity 2017, Blender. Unity: RockVR (Video Capture), scenes, character models

Hao Xu 9 Jun 01, 2022
An implementation of quantum convolutional neural network with MindQuantum. Huawei, classifying MNIST dataset

关于实现的一点说明 山东大学 2020级 苏博南 www.subonan.com 文件说明 tools.py 这里面主要有两个函数: resize(a, lenb) 这其实是我找同学写的一个小算法hhh。给出一个$28\times 28$的方阵a,返回一个$lenb\times lenb$的方阵。因

ぼっけなす 2 Aug 29, 2022
Disease Informed Neural Networks (DINNs) — neural networks capable of learning how diseases spread, forecasting their progression, and finding their unique parameters (e.g. death rate).

DINN We introduce Disease Informed Neural Networks (DINNs) — neural networks capable of learning how diseases spread, forecasting their progression, a

19 Dec 10, 2022
Training Very Deep Neural Networks Without Skip-Connections

DiracNets v2 update (January 2018): The code was updated for DiracNets-v2 in which we removed NCReLU by adding per-channel a and b multipliers without

Sergey Zagoruyko 585 Oct 12, 2022
Language Used: Python . Made in Jupyter(Anaconda) notebook.

FACE-DETECTION-ATTENDENCE-SYSTEM Made in Jupyter(Anaconda) notebook. Language Used: Python Steps to perform before running the program : Install Anaco

1 Jan 12, 2022
Deep Learning for humans

Keras: Deep Learning for Python Under Construction In the near future, this repository will be used once again for developing the Keras codebase. For

Keras 57k Jan 09, 2023
Discord Multi Tool that focuses on design and easy usage

Multi-Tool-v1.0 Discord Multi Tool that focuses on design and easy usage Delete webhook Block all friends Spam webhook Modify webhook Webhook info Tok

Lodi#0001 24 May 23, 2022
Official public repository of paper "Intention Adaptive Graph Neural Network for Category-Aware Session-Based Recommendation"

Intention Adaptive Graph Neural Network (IAGNN) This is the official repository of paper Intention Adaptive Graph Neural Network for Category-Aware Se

9 Nov 22, 2022
Hitters Linear Regression - Hitters Linear Regression With Python

Hitters_Linear_Regression Kullanacağımız veri seti Carnegie Mellon Üniversitesi'

AyseBuyukcelik 2 Jan 26, 2022
MTCNN face detection implementation for TensorFlow, as a PIP package.

MTCNN Implementation of the MTCNN face detector for Keras in Python3.4+. It is written from scratch, using as a reference the implementation of MTCNN

Iván de Paz Centeno 1.9k Dec 30, 2022