Are Convolutional Neural Networks or Transformers more like human vision?

This repository contains the code and fine-tuned models of popular Convolutional Neural Networks (CNNs) and the recently proposed Vision Transformer (ViT) on the augmented Imagenet dataset and the shape/texture bias tests run on the Stylized Imagenet dataset.

This work compares CNNs and the ViT against humans in terms of error consistency beyond traditional metrics. Through these tests, we were able to show that recently proposed self-attention based Transformer models have more human-like errors that traditional CNNs.

Colab

You can directly run tests on the results using a Google Colaboratory without needing to install anything on your local machine. Click "Open in Colab" below:

Developer

Shikhar Tuli. For any questions, comments or suggestions, please reach me at [email protected].

Cite this work

If you use our experimental results or fine-tuned models, please cite:

@article{tuli2021cogsci,
      title={Are Convolutional Neural Networks or Transformers more like human vision?}, 
      author={Shikhar Tuli and Ishita Dasgupta and Erin Grant and Thomas L. Griffiths},
      year={2021},
      eprint={2105.07197},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Study of human inductive biases in CNNs and Transformers.

Related tags

Overview

Are Convolutional Neural Networks or Transformers more like human vision?

Colab

Developer

Cite this work

Owner

Shikhar Tuli

PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

A generalized framework for prototyping full-stack cooperative driving automation applications under CARLA+SUMO.

Does Pretraining for Summarization Reuqire Knowledge Transfer?

Crossover Learning for Fast Online Video Instance Segmentation (ICCV 2021)

Text Extraction Formulation + Feedback Loop for state-of-the-art WSD (EMNLP 2021)

Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation

Implementation of "JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting"

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

🥈78th place in Riiid Answer Correctness Prediction competition

The author's officially unofficial PyTorch BigGAN implementation.

A tensorflow/keras implementation of StyleGAN to generate images of new Pokemon.

This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong Poisons

Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection"

Learning Optical Flow from a Few Matches (CVPR 2021)

7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle

Distinguishing Commercial from Editorial Content in News

Multi-task yolov5 with detection and segmentation based on yolov5

[NeurIPS 2020] Official Implementation: "SMYRF: Efficient Attention using Asymmetric Clustering".

Project to create an open-source 6 DoF input device

Generalized Data Weighting via Class-level Gradient Manipulation