Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

Related tags

Deep LearningDetic
Overview

Detecting Twenty-thousand Classes using Image-level Supervision

Detic: A Detector with image classes that can use image-level labels to easily train detectors.

Detecting Twenty-thousand Classes using Image-level Supervision,
Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra,
arXiv technical report (arXiv 2201.02605)

Features

  • Detects any class given class names (using CLIP).

  • We train the detector on ImageNet-21K dataset with 21K classes.

  • Cross-dataset generalization to OpenImages and Objects365 without finetuning.

  • State-of-the-art results on Open-vocabulary LVIS and Open-vocabulary COCO.

  • Works for DETR-style detectors.

Installation

See installation instructions.

Demo

Integrated into Huggingface Spaces 🤗 using Gradio. Try out the web demo: Hugging Face Spaces

Run our demo using Colab (no GPU needed): Open In Colab

We use the default detectron2 demo interface. For example, to run our 21K model on a messy desk image (image credit David Fouhey) with the lvis vocabulary, run

mkdir models
wget https://dl.fbaipublicfiles.com/detic/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth -O models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
wget https://web.eecs.umich.edu/~fouhey/fun/desk/desk.jpg
python demo.py --config-file configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml --input desk.jpg --output out.jpg --vocabulary lvis --opts MODEL.WEIGHTS models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth

If setup correctly, the output should look like:

The same model can run with other vocabularies (COCO, OpenImages, or Objects365), or a custom vocabulary. For example:

python demo.py --config-file configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml --input desk.jpg --output out2.jpg --vocabulary custom --custom_vocabulary headphone,webcam,paper,coffe --confidence-threshold 0.3 --opts MODEL.WEIGHTS models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth

The output should look like:

Note that headphone, paper and coffe (typo intended) are not LVIS classes. Despite the misspelled class name, our detector can produce a reasonable detection for coffe.

Benchmark evaluation and training

Please first prepare datasets, then check our MODEL ZOO to reproduce results in our paper. We highlight key results below:

  • Open-vocabulary LVIS

    mask mAP mask mAP_novel
    Box-Supervised 30.2 16.4
    Detic 32.4 24.9
  • Standard LVIS

    Detector/ Backbone mask mAP mask mAP_rare
    Box-Supervised CenterNet2-ResNet50 31.5 25.6
    Detic CenterNet2-ResNet50 33.2 29.7
    Box-Supervised CenterNet2-SwinB 40.7 35.9
    Detic CenterNet2-SwinB 41.7 41.7
    Detector/ Backbone box mAP box mAP_rare
    Box-Supervised DeformableDETR-ResNet50 31.7 21.4
    Detic DeformableDETR-ResNet50 32.5 26.2
  • Cross-dataset generalization

    Backbone Objects365 box mAP OpenImages box mAP50
    Box-Supervised SwinB 19.1 46.2
    Detic SwinB 21.4 55.2

License

The majority of Detic is licensed under the Apache 2.0 license, however portions of the project are available under separate license terms: SWIN-Transformer, CLIP, and TensorFlow Object Detection API are licensed under the MIT license; UniDet is licensed under the Apache 2.0 license; and the LVIS API is licensed under a custom license (https://github.com/lvis-dataset/lvis-api/blob/master/LICENSE)” If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0

Ethical Considerations

Detic's wide range of detection capabilities may introduce similar challenges to many other visual recognition and open-set recognition methods. As the user can define arbitrary detection classes, class design and semantics may impact the model output.

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{zhou2021detecting,
  title={Detecting Twenty-thousand Classes using Image-level Supervision},
  author={Zhou, Xingyi and Girdhar, Rohit and Joulin, Armand and Kr{\"a}henb{\"u}hl, Philipp and Misra, Ishan},
  booktitle={arXiv preprint arXiv:2201.02605},
  year={2021}
}
Owner
Meta Research
Meta Research
🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval 🏆 The 1st Place Submission to AICity Challenge 2021 Natural

82 Dec 29, 2022
The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

ISC21-Descriptor-Track-1st The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track. You can check our solution

lyakaap 75 Jan 08, 2023
Solutions of Reinforcement Learning 2nd Edition

Solutions of Reinforcement Learning, An Introduction

YIFAN WANG 1.4k Dec 30, 2022
Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift (ICCV 2021)

Π-NAS This repository provides the evaluation code of our submitted paper: Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training

Jiqi Zhang 18 Aug 18, 2022
I tried to apply the CAM algorithm to YOLOv4 and it worked.

YOLOV4:You Only Look Once目标检测模型在pytorch当中的实现 2021年2月7日更新: 加入letterbox_image的选项,关闭letterbox_image后网络的map得到大幅度提升。 目录 性能情况 Performance 实现的内容 Achievement

55 Dec 05, 2022
Prevent `CUDA error: out of memory` in just 1 line of code.

🐨 Koila Koila solves CUDA error: out of memory error painlessly. Fix it with just one line of code, and forget it. 🚀 Features 🙅 Prevents CUDA error

RenChu Wang 1.7k Jan 02, 2023
A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis

A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis This is the pytorch implementation for our MICCAI 2021 paper. A Mul

Jiarong Ye 7 Apr 04, 2022
The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

LEAR The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction". **The code is in the "master

杨攀 93 Jan 07, 2023
Single Image Deraining Using Bilateral Recurrent Network (TIP 2020)

Single Image Deraining Using Bilateral Recurrent Network Introduction Single image deraining has received considerable progress based on deep convolut

23 Aug 10, 2022
Accelerate Neural Net Training by Progressively Freezing Layers

FreezeOut A simple technique to accelerate neural net training by progressively freezing layers. This repository contains code for the extended abstra

Andy Brock 203 Jun 19, 2022
A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX

Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX Foolbox is a Python li

Bethge Lab 2.4k Dec 25, 2022
NICE-GAN — Official PyTorch Implementation Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

NICE-GAN-pytorch - Official PyTorch implementation of NICE-GAN: Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Runfa Chen 208 Nov 25, 2022
An End-to-End Machine Learning Library to Optimize AUC (AUROC, AUPRC).

Logo by Zhuoning Yuan LibAUC: A Machine Learning Library for AUC Optimization Website | Updates | Installation | Tutorial | Research | Github LibAUC a

Optimization for AI 176 Jan 07, 2023
Cognition-aware Cognate Detection

Cognition-aware Cognate Detection The repository which contains our code for our EACL 2021 paper titled, "Cognition-aware Cognate Detection". This wor

Prashant K. Sharma 1 Feb 01, 2022
Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering

Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering

Meng Liu 2 Jul 19, 2022
CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability

This is the official repository of the paper: CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability A private copy of the

Fadi Boutros 33 Dec 31, 2022
CL-Gym: Full-Featured PyTorch Library for Continual Learning

CL-Gym: Full-Featured PyTorch Library for Continual Learning CL-Gym is a small yet very flexible library for continual learning research and developme

Iman Mirzadeh 36 Dec 25, 2022
Disease Informed Neural Networks (DINNs) — neural networks capable of learning how diseases spread, forecasting their progression, and finding their unique parameters (e.g. death rate).

DINN We introduce Disease Informed Neural Networks (DINNs) — neural networks capable of learning how diseases spread, forecasting their progression, a

19 Dec 10, 2022
Resources for our AAAI 2022 paper: "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

LOREN Resources for our AAAI 2022 paper (pre-print): "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification". DEMO System Check out o

Jiangjie Chen 37 Dec 27, 2022
Hi Guys, here I am providing examples, which will help you in Lerarning Python

LearningPython Hi guys, here I am trying to include as many practice examples of Python Language, as i Myself learn, and hope these will help you in t

4 Feb 03, 2022