"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Last update: Dec 06, 2022

Overview

FGVC8

Exploring Vision Transformers for Fine-grained Classification paper presented at the CVPR 2021, The Eight Workshop on Fine-Grained Visual Categorization on June 25th.

Abstract

Existing computer vision research in categorization struggles with fine-grained attributes recognition due to the inherently high intra-class variances and low inter-class variances. SOTA methods tackle this challenge by locating the most informative image regions and rely on them to classify the complete image. The most recent work, Vision Transformer (ViT), shows its strong performance in both traditional and fine-grained classification tasks.

In this work, we propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes using the inherent multi-head self-attention mechanism. We also introduce attention-guided augmentations for improving the model's capabilities.

We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology. We also prove our model's interpretability via qualitative results.

Instructions

Upcoming

Citation

If you find interesting our results, or you use or code/ideas please consider to cite our work:

@misc{conde2021exploring,
      title={Exploring Vision Transformers for Fine-grained Classification}, 
      author={Marcos V. Conde and Kerem Turgutlu},
      year={2021},
      eprint={2106.10587},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Related tags

Overview

FGVC8

Abstract

Instructions

Citation

References

Owner

Marcos V. Conde

PyTorch ,ONNX and TensorRT implementation of YOLOv4

Stock-Prediction - prediction of stock market movements using sentiment analysis and deep learning.

DAT4 - General Assembly's Data Science course in Washington, DC

A library for answering questions using data you cannot see

A collection of implementations of deep domain adaptation algorithms

[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.

Code release for Hu et al. Segmentation from Natural Language Expressions. in ECCV, 2016

Detector for Log4Shell exploitation attempts

Distributional Sliced-Wasserstein distance code

CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

Unsupervised Feature Loss (UFLoss) for High Fidelity Deep learning (DL)-based reconstruction

Moon-patrol - A faithful recreation of the 1983 hit classic Moon Patrol for the Atari 2600 created using the Pygame library for Python

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

Contrastively Disentangled Sequential Variational Audoencoder

A resource for learning about ML, DL, PyTorch and TensorFlow. Feedback always appreciated :)

Main repository for the HackBio'2021 Virtual Internship Experience for #Team-Greider ❤️

Disentangled Lifespan Face Synthesis

MLP-Numpy - A simple modular implementation of Multi Layer Perceptron in pure Numpy.

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.