VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Related tags

Deep LearningVL-LTR
Overview

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Usage

First, install PyTorch 1.7.1+, torchvision 0.8.2+ and other required packages as follows:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install mmcv==1.3.14

Data preparation

ImageNet-LT

Download and extract ImageNet train and val images from here. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val/ folder respectively.

Then download and extract the wiki text into the same directory, and the directory tree of data is expected to be like this:

./data/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg
  wiki/
  	desc_1.txt
  ImageNet_LT_test.txt
  ImageNet_LT_train.txt
  ImageNet_LT_val.txt
  labels.txt

After that, download the CLIP's pretrained weight RN50.pt and ViT-B-16.pt into the pretrained directory from https://github.com/openai/CLIP.

Places-LT

Download the places365_standard data from here.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this (almost the same as ImageNet-LT):

./data/places/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg
  wiki/
  	desc_1.txt
  Places_LT_test.txt
  Places_LT_train.txt
  Places_LT_val.txt
  labels.txt

iNaturalist 2018

Download the iNaturalist 2018 data from here.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/iNat/
  train_val2018/
  wiki/
  	desc_1.txt
  categories.json
  test2018.json
  train2018.json
  val.json

Evaluation

To evaluate VL-LTR with a single GPU run:

  • Pre-training stage
bash eval.sh ${CONFIG_PATH} 1 --eval-pretrain
  • Fine-tuning stage:
bash eval.sh ${CONFIG_PATH} 1

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Training

To train VL-LTR on a single node with 8 GPUs for:

  • Pre-training stage, run:
bash dist_train_arun.sh ${PARTITION} ${CONFIG_PATH} 8
  • Fine-tuning stage:

    • First, calculate the $\mathcal L_{\text{lin}}$ of each sentence for AnSS method by running this:
    bash eval.sh ${CONFIG_PATH} 1 --eval-pretrain --select
    • then, running this:
    bash dist_train_arun.sh ${PARTITION} ${CONFIG_PATH} 8

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Results

Below list our model's performance on ImageNet-LT, Places-LT, and iNaturalist 2018.

Dataset Backbone Top-1 Accuracy Download
ImageNet-LT ResNet-50 70.1 Weights
ImageNet-LT ViT-Base-16 77.2 Weights
Places-LT ResNet-50 48.0 Weights
Places-LT ViT-Base-16 50.1 Weights
iNaturalist 2018 ResNet-50 74.6 Weights
iNaturalist 2018 ViT-Base-16 76.8 Weights

For more detailed information, please refer to our paper directly.

Citation

If you are interested in our work, please cite as follows:

@article{tian2021vl,
  title={VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition},
  author={Tian, Changyao and Wang, Wenhai and Zhu, Xizhou and Wang, Xiaogang and Dai, Jifeng and Qiao, Yu},
  journal={arXiv preprint arXiv:2111.13579},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

You might also like...
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks We provide the code (in PyTorch) and datasets for our paper "On Size-Orient

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Implementation of
Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

Official codes for the paper
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

[ICCV2021] Official code for
[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

CTR-GCN This repo is the official implementation for Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. The pap

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection
Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppressio

Comments
  • Problem about running eval.sh

    Problem about running eval.sh

    """ #!/usr/bin/env bash set -x

    export NCCL_LL_THRESHOLD=0

    CONFIG=$1 GPUS=$1 CPUS=$[GPUS*2] PORT=${PORT:-8886}

    CONFIG_NAME=${CONFIG##/} CONFIG_NAME=${CONFIG_NAME%.}

    OUTPUT_DIR="./checkpoints/eval" if [ ! -d $OUTPUT_DIR ]; then mkdir ${OUTPUT_DIR} fi

    python -u main.py
    --port=$PORT
    --num_workers 4
    --resume "./checkpoints/${CONFIG_NAME}/checkpoint.pth"
    --output-dir ${OUTPUT_DIR}
    --config $CONFIG ${@:3}
    --eval
    2>&1 | tee -a ${OUTPUT_DIR}/train.log """ I have two A100, so set GPUS is 2. All other settings according to ReadME.md but I got a problem when running eval.sh """ File "eval.sh", line 4 export NCCL_LL_THRESHOLD=0 ^ SyntaxError: invalid syntax

    """

    opened by euminds 2
  • Mismatch between code and diagram in paper for the fine-tuning phase

    Mismatch between code and diagram in paper for the fine-tuning phase

    In fig 3, stage 2 from the paper, it looks like value for the attention is calculated based on Vision and language (Q is vision, K is language) and then applied to the language (V). But in the code, the attention is applied to the visual features. Can you verify which one is the correct way? @ChangyaoTian

    opened by rahulvigneswaran 0
  • pre-trained weights with TorchScript?

    pre-trained weights with TorchScript?

    Hello, Thanks for the great work! May I ask if it's possible for you to also provide the checkpoint weight in a TorchScript version?

    It's something like:

    import torch
    import torchvision.models as models
    
    model = models.resnet50()
    traced = torch.jit.trace(model, (torch.rand(4, 3, 224, 224),))
    torch.jit.save(traced, "test.pt")
    
    # load model
    model = torch.jit.load("test.pt")
    
    opened by xinleihe 0
Releases(ECCV-2022-video)
Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima

Xuanchi Ren 86 Dec 07, 2022
Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

Child-Tuning Source code for EMNLP 2021 Long paper: Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning. 1. Environ

46 Dec 12, 2022
某学校选课系统GIF验证码数据集 + Baseline模型 + 上下游相关工具

elective-dataset-2021spring 某学校2021春季选课系统GIF验证码数据集(29338张) + 准确率98.4%的Baseline模型 + 上下游相关工具。 数据集采用 知识共享署名-非商业性使用 4.0 国际许可协议 进行许可。 Baseline模型和上下游相关工具采用

xmcp 27 Sep 17, 2021
Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

ON-LSTM This repository contains the code used for word-level language model and unsupervised parsing experiments in Ordered Neurons: Integrating Tree

Yikang Shen 572 Nov 21, 2022
Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

MusCaps: Generating Captions for Music Audio Ilaria Manco1 2, Emmanouil Benetos1, Elio Quinton2, Gyorgy Fazekas1 1 Queen Mary University of London, 2

Ilaria Manco 57 Dec 07, 2022
Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning"

Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning" This is the code for the paper Solving Graph-based Public Goo

Victor-Alexandru Darvariu 3 Dec 05, 2022
An official implementation of MobileStyleGAN in PyTorch

MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis Official PyTorch Implementation The accompanying videos c

Sergei Belousov 602 Jan 07, 2023
Deep Learning agent of Starcraft2, similar to AlphaStar of DeepMind except size of network.

Introduction This repository is for Deep Learning agent of Starcraft2. It is very similar to AlphaStar of DeepMind except size of network. I only test

Dohyeong Kim 136 Jan 04, 2023
S2s2net - Sentinel-2 Super-Resolution Segmentation Network

S2S2Net Sentinel-2 Super-Resolution Segmentation Network Getting started Install

Wei Ji 10 Nov 10, 2022
Logsig-RNN: a novel network for robust and efficient skeleton-based action recognition

GCN_LogsigRNN This repository holds the codebase for the paper: Logsig-RNN: a novel network for robust and efficient skeleton-based action recognition

7 Oct 14, 2022
A PaddlePaddle implementation of STGCN with a few modifications in the model architecture in order to forecast traffic jam.

About This repository contains the code of a PaddlePaddle implementation of STGCN based on the paper Spatio-Temporal Graph Convolutional Networks: A D

Tianjian Li 1 Jan 11, 2022
WORD: Revisiting Organs Segmentation in the Whole Abdominal Region

WORD: Revisiting Organs Segmentation in the Whole Abdominal Region. This repository provides the codebase and dataset for our work WORD: Revisiting Or

Healthcare Intelligence Laboratory 71 Jan 07, 2023
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Antoine Caillon 589 Jan 02, 2023
A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Duplicate Image Detection Getting Started Install dependencies pip install -r requirements.txt Run service python main.py Testing Test with pytest How

Matthew Podolak 21 Nov 11, 2022
CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification (ICCV2021)

CM-NAS Official Pytorch code of paper CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification in ICCV2021. Vis

JDAI-CV 40 Nov 25, 2022
Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Emile van Krieken 140 Dec 30, 2022
Pytorch implementation of face attention network

Face Attention Network Pytorch implementation of face attention network as described in Face Attention Network: An Effective Face Detector for the Occ

Hooks 312 Dec 09, 2022
Romanian Automatic Speech Recognition from the ROBIN project

RobinASR This repository contains Robin's Automatic Speech Recognition (RobinASR) for the Romanian language based on the DeepSpeech2 architecture, tog

RACAI 10 Jan 01, 2023
Improving the robustness and performance of biomedical NLP models through adversarial training

RobustBioNLP Improving the robustness and performance of biomedical NLP models through adversarial training In this repository you can find suppliment

Milad Moradi 3 Sep 20, 2022
This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR,which is an open-source toolbox based on PyTorch. The overall architecture will be sh

Jianquan Ye 82 Nov 17, 2022