DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Overview

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Created by Yongming Rao*, Wenliang Zhao*, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie Zhou, Jiwen Lu,

This repository contains PyTorch implementation for DenseCLIP.

DenseCLIP is a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP. Specifically, we convert the original image-text matching problem in CLIP to a pixel-text matching problem and use the pixel-text score maps to guide the learning of dense prediction models. By further using the contextual information from the image to prompt the language model, we are able to facilitate our model to better exploit the pre-trained knowledge. Our method is model-agnostic, which can be applied to arbitrary dense prediction systems and various pre-trained visual backbones including both CLIP models and ImageNet pre-trained models.

intro

Our code is based on mmsegmentation and mmdetection and timm.

[Project Page] [arXiv]

Usage

Requirements

  • torch>=1.8.0
  • torchvision
  • timm
  • mmcv-full==1.3.17
  • mmseg==0.19.0
  • mmdet==2.17.0
  • fvcore

To use our code, please first install the mmcv-full and mmseg/mmdet following the official guidelines (mmseg, mmdet) and prepare the datasets accordingly.

Pre-trained CLIP Models

Download the pre-trained CLIP models (RN50.pt, RN101.pt, VIT-B-16.pt) and save them to the pretrained folder.

Segmentation

Model Zoo

We provide DenseCLIP models for Semantic FPN framework.

Model FLOPs (G) Params (M) mIoU(SS) mIoU(MS) config url
RN50-CLIP 248.8 31.0 36.9 43.5 config -
RN50-DenseCLIP 269.2 50.3 43.5 44.7 config Tsinghua Cloud
RN101-CLIP 326.6 50.0 42.7 44.3 config -
RN101-DenseCLIP 346.3 67.8 45.1 46.5 config Tsinghua Cloud
ViT-B-CLIP 1037.4 100.8 49.4 50.3 config -
ViT-B-DenseCLIP 1043.1 105.3 50.6 51.3 config Tsinghua Cloud

Training & Evaluation on ADE20K

To train the DenseCLIP model based on CLIP ResNet-50, run:

bash dist_train.sh configs/denseclip_fpn_res50_512x512_80k.py 8

To evaluate the performance with multi-scale testing, run:

bash dist_test.sh configs/denseclip_fpn_res50_512x512_80k.py /path/to/checkpoint 8 --eval mIoU --aug-test

To better measure the complexity of the models, we provide a tool based on fvcore to accurately compute the FLOPs of torch.einsum and other operations:

python get_flops.py /path/to/config --fvcore

You can also remove the --fvcore flag to obtain the FLOPs measured by mmcv for comparisons.

Detection

Model Zoo

We provide models for both RetinaNet and Mask-RCNN framework.

RetinaNet
Model FLOPs (G) Params (M) box AP config url
RN50-CLIP 265 38 36.9 config -
RN50-DenseCLIP 285 60 37.8 config Tsinghua Cloud
RN101-CLIP 341 57 40.5 config -
RN101-DenseCLIP 360 78 41.1 config Tsinghua Cloud
Mask R-CNN
Model FLOPs (G) Params (M) box AP mask AP config url
RN50-CLIP 301 44 39.3 36.8 config -
RN50-DenseCLIP 327 67 40.2 37.6 config Tsinghua Cloud
RN101-CLIP 377 63 42.2 38.9 config -
RN101-DenseCLIP 399 84 42.6 39.6 config Tsinghua Cloud

Training & Evaluation on COCO

To train our DenseCLIP-RN50 using RetinaNet framework, run

 bash dist_train.sh configs/retinanet_denseclip_r50_fpn_1x_coco.py 8

To evaluate the box AP of RN50-DenseCLIP (RetinaNet), run

bash dist_test.sh configs/retinanet_denseclip_r50_fpn_1x_coco.py /path/to/checkpoint 8 --eval bbox

To evaluate both the box AP and the mask AP of RN50-DenseCLIP (Mask-RCNN), run

bash dist_test.sh configs/mask_rcnn_denseclip_r50_fpn_1x_coco.py /path/to/checkpoint 8 --eval bbox segm

License

MIT License

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{rao2021denseclip,
  title={DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting},
  author={Rao, Yongming and Zhao, Wenliang and Chen, Guangyi and Tang, Yansong and Zhu, Zheng and Huang, Guan and Zhou, Jie and Lu, Jiwen},
  journal={arXiv preprint arXiv:2112.01518},
  year={2021}
}
Owner
Yongming Rao
Yongming Rao
[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs

GAN Compression project | paper | videos | slides [NEW!] GAN Compression is accepted by T-PAMI! We released our T-PAMI version in the arXiv v4! [NEW!]

MIT HAN Lab 1k Jan 07, 2023
This repository contains code for the paper "Disentangling Label Distribution for Long-tailed Visual Recognition", published at CVPR' 2021

Disentangling Label Distribution for Long-tailed Visual Recognition (CVPR 2021) Arxiv link Blog post This codebase is built on Causal Norm. Install co

Hyperconnect 85 Oct 18, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

Semi-supervised-learning-for-medical-image-segmentation. Recently, semi-supervised image segmentation has become a hot topic in medical image computin

Healthcare Intelligence Laboratory 1.3k Jan 03, 2023
Yet another video caption

Yet another video caption

Fan Zhimin 5 May 26, 2022
PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility Jae Yong Lee, Joseph DeGol, Chuhang Zou, Derek Hoiem Installation To install nece

31 Apr 19, 2022
Unsupervised Feature Ranking via Attribute Networks.

FRANe Unsupervised Feature Ranking via Attribute Networks (FRANe) converts a dataset into a network (graph) with nodes that correspond to the features

7 Sep 29, 2022
A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️

hf-hub-lightning A callback for pushing lightning models to the Hugging Face Hub. Note: I made this package for myself, mostly...if folks seem to be i

Nathan Raw 27 Dec 14, 2022
Tensor-based approaches for fMRI classification

tensor-fmri Using tensor-based approaches to classify fMRI data from StarPLUS. Citation If you use any code in this repository, please cite the follow

4 Sep 07, 2022
Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

JBHI-Pytorch This repository contains a reference implementation of the algorithms described in our paper "Self-supervised Multi-modal Hybrid Fusion N

FeiyiFANG 5 Dec 13, 2021
[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Enjoy-Hamburger 🍔 Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021) Under construction. Introduction T

Gsunshine 271 Dec 29, 2022
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022
A unified 3D Transformer Pipeline for visual synthesis

Overview This is the official repo for the paper: NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion. NÜWA is a unified multimodal p

Microsoft 2.6k Jan 06, 2023
Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

Filtration Curves for Graph Representation This repository provides the code from the KDD'21 paper Filtration Curves for Graph Representation. Depende

Machine Learning and Computational Biology Lab 16 Oct 16, 2022
The official project of SimSwap (ACM MM 2020)

SimSwap: An Efficient Framework For High Fidelity Face Swapping Proceedings of the 28th ACM International Conference on Multimedia The official reposi

Six_God 2.6k Jan 08, 2023
시각 장애인을 위한 스마트 지팡이에 활용될 딥러닝 모델 (DL Model Repo)

SmartCane-DL-Model Smart Cane using semantic segmentation 참고한 Github repositoy 🔗 https://github.com/JunHyeok96/Road-Segmentation.git 데이터셋 🔗 https://

반드시 졸업한다 (Team Just Graduate) 4 Dec 03, 2021
Code for Generating Disentangled Arguments with Prompts: A Simple Event Extraction Framework that Works

GDAP Code for Generating Disentangled Arguments with Prompts: A Simple Event Extraction Framework that Works Environment Python (verified: v3.8) CUDA

45 Oct 29, 2022
This repo is about implementing different approaches of pose estimation and also is a sub-task of the smart hospital bed project :smile:

Pose-Estimation This repo is a sub-task of the smart hospital bed project which is about implementing the task of pose estimation 😄 Many thanks to th

Max 11 Oct 17, 2022
Transformers are Graph Neural Networks!

🚀 Gated Graph Transformers Gated Graph Transformers for graph-level property prediction, i.e. graph classification and regression. Associated article

Chaitanya Joshi 46 Jun 30, 2022
Distributed DataLoader For Pytorch Based On Ray

Dpex——用户无感知分布式数据预处理组件 一、前言 随着GPU与CPU的算力差距越来越大以及模型训练时的预处理Pipeline变得越来越复杂,CPU部分的数据预处理已经逐渐成为了模型训练的瓶颈所在,这导致单机的GPU配置的提升并不能带来期望的线性加速。预处理性能瓶颈的本质在于每个GPU能够使用的C

Dalong 23 Nov 02, 2022