VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Related tags

Deep LearningVL-LTR
Overview

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Usage

First, install PyTorch 1.7.1+, torchvision 0.8.2+ and other required packages as follows:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install mmcv==1.3.14

Data preparation

ImageNet-LT

Download and extract ImageNet train and val images from here. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val/ folder respectively.

Then download and extract the wiki text into the same directory, and the directory tree of data is expected to be like this:

./data/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg
  wiki/
  	desc_1.txt
  ImageNet_LT_test.txt
  ImageNet_LT_train.txt
  ImageNet_LT_val.txt
  labels.txt

After that, download the CLIP's pretrained weight RN50.pt and ViT-B-16.pt into the pretrained directory from https://github.com/openai/CLIP.

Places-LT

Download the places365_standard data from here.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this (almost the same as ImageNet-LT):

./data/places/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg
  wiki/
  	desc_1.txt
  Places_LT_test.txt
  Places_LT_train.txt
  Places_LT_val.txt
  labels.txt

iNaturalist 2018

Download the iNaturalist 2018 data from here.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/iNat/
  train_val2018/
  wiki/
  	desc_1.txt
  categories.json
  test2018.json
  train2018.json
  val.json

Evaluation

To evaluate VL-LTR with a single GPU run:

  • Pre-training stage
bash eval.sh ${CONFIG_PATH} 1 --eval-pretrain
  • Fine-tuning stage:
bash eval.sh ${CONFIG_PATH} 1

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Training

To train VL-LTR on a single node with 8 GPUs for:

  • Pre-training stage, run:
bash dist_train_arun.sh ${PARTITION} ${CONFIG_PATH} 8
  • Fine-tuning stage:

    • First, calculate the $\mathcal L_{\text{lin}}$ of each sentence for AnSS method by running this:
    bash eval.sh ${CONFIG_PATH} 1 --eval-pretrain --select
    • then, running this:
    bash dist_train_arun.sh ${PARTITION} ${CONFIG_PATH} 8

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Results

Below list our model's performance on ImageNet-LT, Places-LT, and iNaturalist 2018.

Dataset Backbone Top-1 Accuracy Download
ImageNet-LT ResNet-50 70.1 Weights
ImageNet-LT ViT-Base-16 77.2 Weights
Places-LT ResNet-50 48.0 Weights
Places-LT ViT-Base-16 50.1 Weights
iNaturalist 2018 ResNet-50 74.6 Weights
iNaturalist 2018 ViT-Base-16 76.8 Weights

For more detailed information, please refer to our paper directly.

Citation

If you are interested in our work, please cite as follows:

@article{tian2021vl,
  title={VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition},
  author={Tian, Changyao and Wang, Wenhai and Zhu, Xizhou and Wang, Xiaogang and Dai, Jifeng and Qiao, Yu},
  journal={arXiv preprint arXiv:2111.13579},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

You might also like...
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks We provide the code (in PyTorch) and datasets for our paper "On Size-Orient

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Implementation of
Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

Official codes for the paper
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

[ICCV2021] Official code for
[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

CTR-GCN This repo is the official implementation for Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. The pap

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection
Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppressio

Comments
  • Problem about running eval.sh

    Problem about running eval.sh

    """ #!/usr/bin/env bash set -x

    export NCCL_LL_THRESHOLD=0

    CONFIG=$1 GPUS=$1 CPUS=$[GPUS*2] PORT=${PORT:-8886}

    CONFIG_NAME=${CONFIG##/} CONFIG_NAME=${CONFIG_NAME%.}

    OUTPUT_DIR="./checkpoints/eval" if [ ! -d $OUTPUT_DIR ]; then mkdir ${OUTPUT_DIR} fi

    python -u main.py
    --port=$PORT
    --num_workers 4
    --resume "./checkpoints/${CONFIG_NAME}/checkpoint.pth"
    --output-dir ${OUTPUT_DIR}
    --config $CONFIG ${@:3}
    --eval
    2>&1 | tee -a ${OUTPUT_DIR}/train.log """ I have two A100, so set GPUS is 2. All other settings according to ReadME.md but I got a problem when running eval.sh """ File "eval.sh", line 4 export NCCL_LL_THRESHOLD=0 ^ SyntaxError: invalid syntax

    """

    opened by euminds 2
  • Mismatch between code and diagram in paper for the fine-tuning phase

    Mismatch between code and diagram in paper for the fine-tuning phase

    In fig 3, stage 2 from the paper, it looks like value for the attention is calculated based on Vision and language (Q is vision, K is language) and then applied to the language (V). But in the code, the attention is applied to the visual features. Can you verify which one is the correct way? @ChangyaoTian

    opened by rahulvigneswaran 0
  • pre-trained weights with TorchScript?

    pre-trained weights with TorchScript?

    Hello, Thanks for the great work! May I ask if it's possible for you to also provide the checkpoint weight in a TorchScript version?

    It's something like:

    import torch
    import torchvision.models as models
    
    model = models.resnet50()
    traced = torch.jit.trace(model, (torch.rand(4, 3, 224, 224),))
    torch.jit.save(traced, "test.pt")
    
    # load model
    model = torch.jit.load("test.pt")
    
    opened by xinleihe 0
Releases(ECCV-2022-video)
MMGeneration is a powerful toolkit for generative models, based on PyTorch and MMCV.

Documentation: https://mmgeneration.readthedocs.io/ Introduction English | 简体中文 MMGeneration is a powerful toolkit for generative models, especially f

OpenMMLab 1.3k Dec 29, 2022
Implementation of Google Brain's WaveGrad high-fidelity vocoder

WaveGrad Implementation (PyTorch) of Google Brain's high-fidelity WaveGrad vocoder (paper). First implementation on GitHub with high-quality generatio

Ivan Vovk 363 Dec 27, 2022
Spherical Confidence Learning for Face Recognition, accepted to CVPR2021.

Sphere Confidence Face (SCF) This repository contains the PyTorch implementation of Sphere Confidence Face (SCF) proposed in the CVPR2021 paper: Shen

Maths 70 Dec 09, 2022
TensorFlow 2 implementation of the Yahoo Open-NSFW model

TensorFlow 2 implementation of the Yahoo Open-NSFW model

Bosco Yung 101 Jan 01, 2023
Personal project about genus-0 meshes, spherical harmonics and a cow

How to transform a cow into spherical harmonics ? Spot the cow, from Keenan Crane's blog Context In the field of Deep Learning, training on images or

3 Aug 22, 2022
[ICCV2021] IICNet: A Generic Framework for Reversible Image Conversion

IICNet - Invertible Image Conversion Net Official PyTorch Implementation for IICNet: A Generic Framework for Reversible Image Conversion (ICCV2021). D

felixcheng97 55 Dec 06, 2022
SeMask: Semantically Masked Transformers for Semantic Segmentation.

SeMask: Semantically Masked Transformers Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi This repo co

Picsart AI Research (PAIR) 186 Dec 30, 2022
NeuroFind - A solution to the to the Task given by the Oberseminar of Messtechnik Institute of TU Dresden in 2021

NeuroFind A solution to the to the Task given by the Oberseminar of Messtechnik

1 Jan 20, 2022
OOD Generalization and Detection (ACL 2020)

Pretrained Transformers Improve Out-of-Distribution Robustness How does pretraining affect out-of-distribution robustness? We create an OOD benchmark

littleRound 57 Jan 09, 2023
Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Overview PyTorch 0.4.1 | Python 3.6.5 Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein g

Shayne O'Brien 471 Dec 16, 2022
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 90 Dec 31, 2022
Official PyTorch code for the paper: "Point-Based Modeling of Human Clothing" (ICCV 2021)

Point-Based Modeling of Human Clothing Paper | Project page | Video This is an official PyTorch code repository of the paper "Point-Based Modeling of

Visual Understanding Lab @ Samsung AI Center Moscow 64 Nov 22, 2022
Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Diverse Object-Scene Compositions For Zero-Shot Action Recognition This repository contains the source code for the use of object-scene compositions f

7 Sep 21, 2022
The code of paper 'Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection'

Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection Pytorch implemetation of paper 'Learning to Aggregate and Personalize

Tencent YouTu Research 136 Dec 29, 2022
Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

RealBasicVSR [Paper] This is the official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution, arXiv". This repository contain

Kelvin C.K. Chan 566 Dec 28, 2022
Unet network with mean teacher for altrasound image segmentation

Unet network with mean teacher for altrasound image segmentation

5 Nov 21, 2022
Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Video Representation Learning by Recognizing Temporal Transformations [Project Page] Simon Jenni, Givi Meishvili, and Paolo Favaro. In ECCV, 2020. Thi

Simon Jenni 46 Nov 14, 2022
Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21)

AdvRush Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21) Environmental Set-up Python == 3.6.12, PyTorch =

11 Dec 10, 2022
iBOT: Image BERT Pre-Training with Online Tokenizer

Image BERT Pre-Training with iBOT Official PyTorch implementation and pretrained models for paper iBOT: Image BERT Pre-Training with Online Tokenizer.

Bytedance Inc. 435 Jan 06, 2023
Predicts an answer in yes or no.

Oui-ou-non-prediction Predicts an answer in 'yes' or 'no'. It is based on the game 'effeuiller la marguerite' in which the person plucks flower petals

Ananya Gupta 1 Jan 15, 2022