VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Last update: Dec 12, 2022

Related tags

Overview

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Usage

First, install PyTorch 1.7.1+, torchvision 0.8.2+ and other required packages as follows：

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install mmcv==1.3.14

Data preparation

ImageNet-LT

Download and extract ImageNet train and val images from here. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val/ folder respectively.

Then download and extract the wiki text into the same directory, and the directory tree of data is expected to be like this:

./data/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg
  wiki/
  	desc_1.txt
  ImageNet_LT_test.txt
  ImageNet_LT_train.txt
  ImageNet_LT_val.txt
  labels.txt

After that, download the CLIP's pretrained weight RN50.pt and ViT-B-16.pt into the pretrained directory from https://github.com/openai/CLIP.

Places-LT

Download the places365_standard data from here.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this (almost the same as ImageNet-LT):

./data/places/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg
  wiki/
  	desc_1.txt
  Places_LT_test.txt
  Places_LT_train.txt
  Places_LT_val.txt
  labels.txt

iNaturalist 2018

Download the iNaturalist 2018 data from here.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/iNat/
  train_val2018/
  wiki/
  	desc_1.txt
  categories.json
  test2018.json
  train2018.json
  val.json

Evaluation

To evaluate VL-LTR with a single GPU run:

Pre-training stage

bash eval.sh ${CONFIG_PATH} 1 --eval-pretrain

Fine-tuning stage:

bash eval.sh ${CONFIG_PATH} 1

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Training

To train VL-LTR on a single node with 8 GPUs for:

Pre-training stage, run:

bash dist_train_arun.sh ${PARTITION} ${CONFIG_PATH} 8

Fine-tuning stage:
- First, calculate the $\mathcal L_{\text{lin}}$ of each sentence for AnSS method by running this:
```
bash eval.sh ${CONFIG_PATH} 1 --eval-pretrain --select
```
- then, running this:
```
bash dist_train_arun.sh ${PARTITION} ${CONFIG_PATH} 8
```

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Results

Below list our model's performance on ImageNet-LT, Places-LT, and iNaturalist 2018.

Dataset	Backbone	Top-1 Accuracy	Download
ImageNet-LT	ResNet-50	70.1	Weights
ImageNet-LT	ViT-Base-16	77.2	Weights
Places-LT	ResNet-50	48.0	Weights
Places-LT	ViT-Base-16	50.1	Weights
iNaturalist 2018	ResNet-50	74.6	Weights
iNaturalist 2018	ViT-Base-16	76.8	Weights

For more detailed information, please refer to our paper directly.

Citation

If you are interested in our work, please cite as follows:

@article{tian2021vl,
  title={VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition},
  author={Tian, Changyao and Wang, Wenhai and Zhu, Xizhou and Wang, Xiaogang and Dai, Jifeng and Qiao, Yu},
  journal={arXiv preprint arXiv:2111.13579},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P

16 Dec 14, 2022

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

28 Oct 18, 2022

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks We provide the code (in PyTorch) and datasets for our paper "On Size-Orient

4 Jun 18, 2022

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

3 Jan 4, 2023

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

105 Nov 7, 2022

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

21 Aug 23, 2022

[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

CTR-GCN This repo is the official implementation for Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. The pap

148 Dec 16, 2022

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppressio

67 Dec 4, 2022

Comments

Problem about running eval.sh

""" #!/usr/bin/env bash set -x

export NCCL_LL_THRESHOLD=0

CONFIG=$1 GPUS=$1 CPUS=$[GPUS*2] PORT=${PORT:-8886}

CONFIG_NAME=${CONFIG##/} CONFIG_NAME=${CONFIG_NAME%.}

OUTPUT_DIR="./checkpoints/eval" if [ ! -d $OUTPUT_DIR ]; then mkdir ${OUTPUT_DIR} fi

python -u main.py
--port=$PORT
--num_workers 4
--resume "./checkpoints/${CONFIG_NAME}/checkpoint.pth"
--output-dir ${OUTPUT_DIR}
--config $CONFIG ${@:3}
--eval
2>&1 | tee -a ${OUTPUT_DIR}/train.log """ I have two A100, so set GPUS is 2. All other settings according to ReadME.md but I got a problem when running eval.sh """ File "eval.sh", line 4 export NCCL_LL_THRESHOLD=0 ^ SyntaxError: invalid syntax

"""

opened by euminds 2
Mismatch between code and diagram in paper for the fine-tuning phase

In fig 3, stage 2 from the paper, it looks like value for the attention is calculated based on Vision and language (Q is vision, K is language) and then applied to the language (V). But in the code, the attention is applied to the visual features. Can you verify which one is the correct way? @ChangyaoTian

opened by rahulvigneswaran 0

pre-trained weights with TorchScript?

Hello, Thanks for the great work! May I ask if it's possible for you to also provide the checkpoint weight in a TorchScript version?

It's something like:

import torch
import torchvision.models as models

model = models.resnet50()
traced = torch.jit.trace(model, (torch.rand(4, 3, 224, 224),))
torch.jit.save(traced, "test.pt")

# load model
model = torch.jit.load("test.pt")

opened by xinleihe 0

Releases(ECCV-2022-video)

ECCV-2022-video(Oct 1, 2022)

Source code(tar.gz)
Source code(zip)
4849.mp4(15.24 MB)
ECCV-2022(Oct 1, 2022)

Source code(tar.gz)
Source code(zip)
4849.pdf(2.66 MB)
text-corpus(Jan 12, 2022)

The text corpus for ImageNet-LT, Places-LT, and iNaturalist2018, including both wiki data and the class labels.
Source code(tar.gz)
Source code(zip)
imagenet.zip(7.29 MB)
iNat.zip(13.12 MB)
places.zip(2.67 MB)
checkpoints(Jan 12, 2022)

model weights for 3 datasets, including both 2 stages, using vit16 and resnet50 backbones respectively.
Source code(tar.gz)
Source code(zip)
imageNet-LT_r50.zip(1382.52 MB)
imageNet-LT_vit16.zip(1920.94 MB)
inat_finetune_r50.zip(1754.00 MB)
inat_finetune_vit16.zip(1566.75 MB)
inat_pretrain_r50.zip(1008.10 MB)
inat_pretrain_vit16.zip(1512.80 MB)
places_r50.zip(1252.33 MB)
places_vit16.zip(1843.92 MB)

Owner

GitHub Repository

[CVPR2021] DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

DoDNet This repo holds the pytorch implementation of DoDNet: DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datase

116 Dec 12, 2022

ProjectOxford-ClientSDK - This repo has moved :house: Visit our website for the latest SDKs & Samples

This project has moved 🏠 We heard your feedback! This repo has been deprecated and each project has moved to a new home in a repo scoped by API and p

970 Nov 28, 2022

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

21.8k Jan 09, 2023

The fundamental package for scientific computing with Python.

NumPy is the fundamental package needed for scientific computing with Python. Website: https://www.numpy.org Documentation: https://numpy.org/doc Mail

22.4k Jan 09, 2023

Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

UniRE Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021. Requirements python: 3.7.6 pytorch: 1.8.1 transformers:

109 Nov 29, 2022

An MQA (Studio, originalSampleRate) identifier for lossless flac files written in Python.

An MQA (Studio, originalSampleRate) identifier for "lossless" flac files written in Python.

10 Oct 03, 2022

An implementation of based on pytorch and mmcv

FisherPruning-Pytorch An implementation of Group Fisher Pruning for Practical Network Compression based on pytorch and mmcv Main Functions Pruning f

15 Dec 17, 2022

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation This project hosts the code for implementing the DCT-MASK algorithms

57 Nov 27, 2022

For auto aligning, cropping, and scaling HR and LR images for training image based neural networks

ImgAlign For auto aligning, cropping, and scaling HR and LR images for training image based neural networks Usage Make sure OpenCV is installed, 'pip

15 Dec 04, 2022

FANet - Real-time Semantic Segmentation with Fast Attention

FANet Real-time Semantic Segmentation with Fast Attention Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko , Stan Sc

42 Nov 30, 2022

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds Björn Michele1), Alexandre Boulch1), Gilles Puy1), Maxime Bucher1) and Rena

15 Dec 22, 2022

Out of Distribution Detection on Natural Adversarial Examples

OOD-on-NAE Research project on out of distribution detection for the Computer Vision course by Prof. Rob Fergus (CSCI-GA 2271) Paper out on arXiv - ht

1 Jun 08, 2022

Rank1 Conversation Emotion Detection Task

Rank1-Conversation_Emotion_Detection_Task accuracy macro-f1 recall 0.826 0.7544 0.719 基于预训练模型和时序预测模型的对话情感探测任务 1 摘要针对对话情感探测任务，本文将其分为文本分类和时间序列预测两个子任务，分

2 Nov 28, 2021

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

26 Oct 14, 2022

Predicting the duration of arrival delays for commercial flights.

Flight Delay Prediction Our objective is to predict arrival delays of commercial flights. According to the US Department of Transportation, about 21%

1 Jan 11, 2022

Deploy optimized transformer based models on Nvidia Triton server

1.2k Jan 05, 2023

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

SWAG: Supervised Weakly from hashtAGs This repository contains SWAG models from the paper Revisiting Weakly Supervised Pre-Training of Visual Percepti

134 Jan 05, 2023

PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching This is the official PyTorch implementation of SMODICE: Versatile Offline I

14 Aug 30, 2022

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL

3 Dec 26, 2022

Official implementation for paper Render In-between: Motion Guided Video Synthesis for Action Interpolation

Render In-between: Motion Guided Video Synthesis for Action Interpolation [Paper] [Supp] [arXiv] [4min Video] This is the official Pytorch implementat

8 Oct 27, 2022

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Related tags

Overview

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Usage

Data preparation

ImageNet-LT

Places-LT

iNaturalist 2018

Evaluation

Training

Results

Citation

License

You might also like...

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Comments

Problem about running eval.sh

Mismatch between code and diagram in paper for the fine-tuning phase

pre-trained weights with TorchScript?

Releases(ECCV-2022-video)

ECCV-2022-video(Oct 1, 2022)

ECCV-2022(Oct 1, 2022)

text-corpus(Jan 12, 2022)

checkpoints(Jan 12, 2022)

Owner

[CVPR2021] DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

ProjectOxford-ClientSDK - This repo has moved :house: Visit our website for the latest SDKs & Samples

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

The fundamental package for scientific computing with Python.

Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

An MQA (Studio, originalSampleRate) identifier for lossless flac files written in Python.

An implementation of based on pytorch and mmcv

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

For auto aligning, cropping, and scaling HR and LR images for training image based neural networks

FANet - Real-time Semantic Segmentation with Fast Attention

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Out of Distribution Detection on Natural Adversarial Examples

Rank1 Conversation Emotion Detection Task

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

Predicting the duration of arrival delays for commercial flights.

Deploy optimized transformer based models on Nvidia Triton server

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id

Official implementation for paper Render In-between: Motion Guided Video Synthesis for Action Interpolation