Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

Overview

PWC PWC PWC PWC

Introdunction

This is the official implementation of the paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

Abstract

This paper presents a simple and effective approach to solving the multi-label classification problem. The proposed approach leverages Transformer decoders to query the existence of a class label. The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image. The built-in cross-attention module in the Transformer decoder offers an effective way to use label embeddings as queries to probe and pool class-related features from a feature map computed by a vision backbone for subsequent binary classifications. Compared with prior works, the new framework is simple, using standard Transformers and vision backbones, and effective, consistently outperforming all previous works on five multi-label classification data sets, including MS-COCO, PASCAL VOC, NUS-WIDE, and Visual Genome. Particularly, we establish 91.3% mAP on MS-COCO. We hope its compact structure, simple implementation, and superior performance serve as a strong baseline for multi-label classification tasks and future studies.

fig

Results on MS-COCO:

fig

Quick start

  1. (optional) Star this repo.

  2. Clone this repo:

git clone [email protected]:SlongLiu/query2labels.git
cd query2labels
  1. Install cuda, PyTorch and torchvision.

Please make sure they are compatible. We test our models on two envs and other configs may also work:

cuda==11, torch==1.9.0, torchvision==0.10.0, python==3.7.3
or
cuda==10.2, torch==1.6.0, torchvision==0.7.0, python==3.7.3
  1. Install other needed packages.
pip install -r requirments.txt
  1. Data preparation.

Download MS-COCO 2014 and modify the path in lib/dataset/cocodataset.py: line 24, 25.

  1. Download pretrained models.

You could download pretrained models from this link. See more details below.

  1. Run!
python q2l_infer.py -a modelname --config /path/to/json/file --resume /path/to/pkl/file [other args]
e.g.
python q2l_infer.py -a 'Q2L-R101-448' --config "pretrained/Q2L-R101-448/config_new.json" -b 16 --resume 'pretrained/Q2L-R101-448/checkpoint.pkl'

pretrianed model

Modelname mAP link(Tsinghua-cloud)
Q2L-R101-448 84.9 this link
Q2L-R101-576 86.5 this link
Q2L-TResL-448 87.3 this link
Q2L-TResL_22k-448 89.2 this link
Q2L-SwinL-384 90.5 this link
Q2L-CvT_w24-384 91.3 this link

Training

Training scripts will be available later.

BibTex

@misc{liu2021query2label,
      title={Query2Label: A Simple Transformer Way to Multi-Label Classification}, 
      author={Shilong Liu and Lei Zhang and Xiao Yang and Hang Su and Jun Zhu},
      year={2021},
      eprint={2107.10834},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

We thank the authors of ASL, TResNet, detr, CvT, and Swin-Transformer for their great works and codes. Thanks to @mrT23 for sharing training tricks and providing a useful script for training.

Owner
Shilong Liu
A spicy chicken. www.lsl.zone
Shilong Liu
Implementation of PyTorch-based multi-task pre-trained models

mtdp Library containing implementation related to the research paper "Multi-task pre-training of deep neural networks for digital pathology" (Mormont

Romain Mormont 27 Oct 14, 2022
AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video actio

Data Analytics Lab at Texas A&M University 267 Dec 17, 2022
The aim of the game, as in the original one, is to find a specific image from a group of different images of a person's face

GUESS WHO Main Links: [Github] [App] Related Links: [CLIP] [Celeba] The aim of the game, as in the original one, is to find a specific image from a gr

Arnau - DIMAI 3 Jan 04, 2022
Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to

Mohamed Ali Souibgui 74 Jan 07, 2023
Codebase for the Summary Loop paper at ACL2020

Summary Loop This repository contains the code for ACL2020 paper: The Summary Loop: Learning to Write Abstractive Summaries Without Examples. Training

Canny Lab @ The University of California, Berkeley 44 Nov 04, 2022
Rethinking Nearest Neighbors for Visual Classification

Rethinking Nearest Neighbors for Visual Classification arXiv Environment settings Check out scripts/env_setup.sh Setup data Download the following fin

Menglin Jia 29 Oct 11, 2022
Object detection on multiple datasets with an automatically learned unified label space.

Simple multi-dataset detection An object detector trained on multiple large-scale datasets with a unified label space; Winning solution of E

Xingyi Zhou 407 Dec 30, 2022
Implementation of the paper Recurrent Glimpse-based Decoder for Detection with Transformer.

REGO-Deformable DETR By Zhe Chen, Jing Zhang, and Dacheng Tao. This repository is the implementation of the paper Recurrent Glimpse-based Decoder for

Zhe Chen 33 Nov 30, 2022
AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Frank Liu 26 Oct 13, 2022
Whisper is a file-based time-series database format for Graphite.

Whisper Overview Whisper is one of three components within the Graphite project: Graphite-Web, a Django-based web application that renders graphs and

Graphite Project 1.2k Dec 25, 2022
SmoothGrad implementation in PyTorch

SmoothGrad implementation in PyTorch PyTorch implementation of SmoothGrad: removing noise by adding noise. Vanilla Gradients SmoothGrad Guided backpro

SSKH 143 Jan 05, 2023
Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition

Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition Introduction Run attack: SGADV.py Objective function: foolbox/attacks/gradi

1 Jul 18, 2022
This repository is the official implementation of Open Rule Induction. This paper has been accepted to NeurIPS 2021.

Open Rule Induction This repository is the official implementation of Open Rule Induction. This paper has been accepted to NeurIPS 2021. Abstract Rule

Xingran Chen 16 Nov 14, 2022
Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

1.4k Jan 05, 2023
MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Main repo for ECCV 2020 paper MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images. visual.cs.brown.edu/matryodshka

Brown University Visual Computing Group 75 Dec 13, 2022
PyTorch implementation of InstaGAN: Instance-aware Image-to-Image Translation

InstaGAN: Instance-aware Image-to-Image Translation Warning: This repo contains a model which has potential ethical concerns. Remark that the task of

Sangwoo Mo 827 Dec 29, 2022
Learning from Synthetic Humans, CVPR 2017

Learning from Synthetic Humans (SURREAL) Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev and Cordelia Schmid,

Gul Varol 538 Dec 18, 2022
Keras community contributions

keras-contrib : Keras community contributions Keras-contrib is deprecated. Use TensorFlow Addons. The future of Keras-contrib: We're migrating to tens

Keras 1.6k Dec 21, 2022
PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

63 Dec 16, 2022
MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモ

Tokyo2020-Pictogram-using-MediaPipe MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモです。 Tokyo2020Pictgram02.mp4 Requirement mediapipe 0.8.6 or later O

KazuhitoTakahashi 295 Dec 26, 2022