modelvshuman is a Python library to benchmark the gap between human and machine vision

Overview

modelvshuman: Does your model generalise better than humans?

modelvshuman is a Python library to benchmark the gap between human and machine vision. Using this library, both PyTorch and TensorFlow models can be evaluated on 17 out-of-distribution datasets with high-quality human comparison data.

๐Ÿ† Benchmark

The top-10 models are listed here; training dataset size is indicated in brackets. Additionally, standard ResNet-50 is included as the last entry of the table for comparison. Model ranks are calculated across the full range of 52 models that we tested. If your model scores better than some (or even all) of the models here, please open a pull request and we'll be happy to include it here!

Most human-like behaviour

winner model accuracy difference โ†“ observed consistency โ†‘ error consistency โ†‘ mean rank โ†“
๐Ÿฅ‡ CLIP: ViT-B (400M) .023 .758 .281 1
๐Ÿฅˆ SWSL: ResNeXt-101 (940M) .028 .752 .237 3.67
๐Ÿฅ‰ BiT-M: ResNet-101x1 (14M) .034 .733 .252 4
๐Ÿ‘ BiT-M: ResNet-152x2 (14M) .035 .737 .243 4.67
๐Ÿ‘ ViT-L (1M) .033 .738 .222 6.67
๐Ÿ‘ BiT-M: ResNet-152x4 (14M) .035 .732 .233 7.33
๐Ÿ‘ BiT-M: ResNet-50x1 (14M) .042 .718 .240 9
๐Ÿ‘ BiT-M: ResNet-50x3 (14M) .040 .726 .228 9
๐Ÿ‘ ViT-L (14M) .035 .744 .206 9.67
๐Ÿ‘ SWSL: ResNet-50 (940M) .041 .727 .211 11.33
... standard ResNet-50 (1M) .087 .665 .208 29

Highest out-of-distribution robustness

winner model OOD accuracy โ†‘ rank โ†“
๐Ÿฅ‡ ViT-L (14M) .733 1
๐Ÿฅˆ CLIP: ViT-B (400M) .708 2
๐Ÿฅ‰ ViT-L (1M) .706 3
๐Ÿ‘ SWSL: ResNeXt-101 (940M) .698 4
๐Ÿ‘ BiT-M: ResNet-152x2 (14M) .694 5
๐Ÿ‘ BiT-M: ResNet-152x4 (14M) .688 6
๐Ÿ‘ BiT-M: ResNet-101x3 (14M) .682 7
๐Ÿ‘ BiT-M: ResNet-50x3 (14M) .679 8
๐Ÿ‘ SimCLR: ResNet-50x4 (1M) .677 9
๐Ÿ‘ SWSL: ResNet-50 (940M) .677 10
... standard ResNet-50 (1M) .559 31

๐Ÿ”ง Installation

Simply clone the repository to a location of your choice and follow these steps:

  1. Set the repository home path by running the following from the command line:

    export MODELVSHUMANDIR=/absolute/path/to/this/repository/
    
  2. Install package (remove the -e option if you don't intend to add your own model or make any other changes)

    pip install -e .
    

๐Ÿ”ฌ User experience

Simply edit examples/evaluate.py as desired. This will test a list of models on out-of-distribution datasets, generating plots. If you then compile latex-report/report.tex, all the plots will be included in one convenient PDF report.

๐Ÿซ Model zoo

The following models are currently implemented:

If you e.g. add/implement your own model, please make sure to compute the ImageNet accuracy as a sanity check.

How to load a model

If you just want to load a model from the model zoo, this is what you can do:

    # loading a PyTorch model from the zoo
    from modelvshuman.models.pytorch.model_zoo import InfoMin
    model = InfoMin("InfoMin")

    # loading a Tensorflow model from the zoo
    from modelvshuman.models.tensorflow.model_zoo import efficientnet_b0
    model = efficientnet_b0("efficientnet_b0")
How to list all available models

All implemented models are registered by the model registry, which can then be used to list all available models of a certain framework with the following method:

    from modelvshuman import models
    
    print(models.list_models("pytorch"))
    print(models.list_models("tensorflow"))
How to add a new model

Adding a new model is possible for standard PyTorch and TensorFlow models. Depending on the framework (pytorch / tensorflow), open modelvshuman/models//model_zoo.py. Here, you can add your own model with a few lines of code - similar to how you would load it usually. If your model has a custom model definition, create a new subdirectory called modelvshuman/models//my_fancy_model/fancy_model.py which you can then import from model_zoo.py via from .my_fancy_model import fancy_model.

๐Ÿ“ Datasets

In total, 17 datasets with human comparison data collected under highly controlled laboratory conditions are available.

Twelve datasets correspond to parametric or binary image distortions. Top row: colour/grayscale, contrast, high-pass, low-pass (blurring), phase noise, power equalisation. Bottom row: opponent colour, rotation, Eidolon I, II and III, uniform noise. noise-stimuli

The remaining five datasets correspond to the following nonparametric image manipulations: sketch, stylized, edge, silhouette, texture-shape cue conflict. nonparametric-stimuli

How to load a dataset

Similarly, if you're interested in just loading a dataset, you can do this via:

   from modelvshuman.datasets import sketch      
   dataset = sketch(batch_size=16, num_workers=4)
How to list all available datasets
    from modelvshuman import datasets
    
    print(list(datasets.list_datasets().keys()))

๐Ÿ’ณ Credit

We collected psychophysical data ourselves, but we used existing image dataset sources. 12 datasets were obtained from Generalisation in humans and deep neural networks. 3 datasets were obtained from ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. Additionally, we used 1 dataset from Learning Robust Global Representations by Penalizing Local Predictive Power (sketch images from ImageNet-Sketch) and 1 dataset from ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness (stylized images from Stylized-ImageNet).

We thank all model authors and repository maintainers for providing the models described above.

Comments
  • Question about self-supervised models

    Question about self-supervised models

    Hello, thanks for the great toolbox~

    I'm little confusion about the results on self-supervised models, like SimCLR. It doesn't have the class-specifical classifier on ImageNet-1k. So how do you load the classifier weight? load an extra-trained classifier (like linear-probing protocol) or a fully-trained network (like end-to-end fine-tuning protocol)?

    And another question: If I just want to test the shape & texture accuracy, mentioned in Intriguing Properties of Vision Transformers, which dataset type should I choose?

    Thank you very much~

    opened by pengzhiliang 5
  • dataset path?

    dataset path?

    Hi --- first thank you a lot for this useful dataset and API!

    I'm trying to load the datasets using the following code, but got an error msg saying dataset sketch path not found: model-vs-human/datasets/sketch/dnn/

    Does this mean that I should download the sketch image dataset myself from the original source and put under this path? Little more documentation on how the image dataset should be structured will be very useful!

    from modelvshuman.datasets import sketch,      
    dataset = sketch(batch_size=16, num_workers=4)
    

    Thank you!

    opened by ahnchive 2
  • Question Regarding Human Raw Dataset

    Question Regarding Human Raw Dataset

    Hi! Thank you for providing a valuable dataset.

    As I dig into the raw data which contains human annotations, some questions raised in my mind. Below is how I analyzed the raw dataset in contrast experiment

    For an image named 0580_cop_dnn_c05_bicycle_10_n03792782_10129.png, I extracted rows in which image_id matches 'c05_bicycle_10_n03792782_10129.png'. Then, I got the following 4 rows. Here, human_annotation is a simple concatenation of all csv files in contrast experiment. image

    Below is the visualization of image 0580_cop_dnn_c05_bicycle_10_n03792782_10129.png image

    Although the image is not very clear, I do not fully agree with the human predictions. (They predicted 'clock', 'oven', 'bear', 'keyboard' -- It is very clear that the figure is not a keyboard.) Is there anything wrong or I missed in the analysis?

    Thanks,

    opened by jiyounglee-0523 2
  • Request for data to reproduce figures

    Request for data to reproduce figures

    Very interesting work and thanks very much for releasing it publicly. We are working on extending some of your results/studies, could you please provide more information related to Figure 2?

    Figure 2 compares several models, but they are not all labeled. I am interested in finding out the accuracy-distortion tradeoff for each model. The figure shows this information at a coarse level, such as to compare all self-supervised models, adversarially trained models, etc. Would be perfect if you could provide the model-name (corresponding to your model zoo) and its corresponding performance at various distortion levels for the 12 distortion types that are considered.

    Thanks again, looking forward to hearing from you!

    opened by vihari 2
  • not able to reproduce the results

    not able to reproduce the results

    Firstly thanks for sharing the interesting paper and code!

    I followed the installation steps but running python examples/evaluate.py (didn't edit anything) resulting the following strange error. Could the authors give some insights on the reasons?

    Plotting accuracy for dataset colour The following model(s) were not found: alexnet List of possible models in this dataset: ['bagnet33' 'resnet50' 'simclr_resnet50x1'] The following model(s) were not found: subject-* List of possible models in this dataset: ['bagnet33' 'resnet50' 'simclr_resnet50x1'] Traceback (most recent call last): File "examples/evaluate.py", line 28, in run_plotting() File "examples/evaluate.py", line 18, in run_plotting figure_directory_name = figure_dirname) File "/home/eric/model_vs_human/modelvshuman/plotting/plot.py", line 108, in plot result_dir=result_dir) File "/home/eric/model_vs_human/modelvshuman/plotting/plot.py", line 744, in plot_accuracy result_dir=result_dir, plot_type="accuracy") File "/home/eric/model_vs_human/modelvshuman/plotting/plot.py", line 772, in plot_general_analyses experiment=e) File "/home/eric/model_vs_human/modelvshuman/plotting/analyses.py", line 254, in get_result_df r = self.analysis(subdat) File "/home/eric/model_vs_human/modelvshuman/plotting/analyses.py", line 284, in analysis self._check_dataframe(df) File "/home/eric/model_vs_human/modelvshuman/plotting/analyses.py", line 24, in _check_dataframe assert len(df) > 0, "empty dataframe" AssertionError: empty dataframe

    opened by largenn 2
  • Question about shape bias

    Question about shape bias

    Could you please provide the formula to get the shape bias? Currently, I can successfully run this repo with my own model, but I am confused about how the shape bias is abtained. I will be very gratefully if you can elaborate it. Thanks!

    opened by YuanLiuuuuuu 1
  • Difficulties loading adversarially trained models

    Difficulties loading adversarially trained models

    I didn't succeed in loading the adversarially trained models. run_evaluation results in the following error:

    HTTP Error 403: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.

    Any idea how to fix this?

    opened by lukasShuber 1
  • Update license info

    Update license info

    In https://github.com/bethgelab/model-vs-human/blob/master/setup.cfg#L18 we state LGPLv3 but we should instead refer to https://github.com/bethgelab/model-vs-human/tree/master/licenses

    documentation 
    opened by rgeirhos 1
  • Clip Improvements

    Clip Improvements

    Two changes:

    1. Explicitly cast the images to device (previous code caused problems when training on google colab);
    2. Compute the zero shot weights needed by CLIP only once and recycle them for subsequent batches. Since the classes remain fixed, computing this for every batch is not necessary and greatly slows down experiments.
    opened by yurigalindo 1
  • Feature request: Machine readable results, e.g. CSV

    Feature request: Machine readable results, e.g. CSV

    Currently, the toolbox saves Latex tables and plots with the resulting accuracies and error consistencies. For custom plotting routines and similar, it would be great to have these numbers additionally in an easy-to-parse format, e.g. CSV or JSON.

    enhancement 
    opened by dekuenstle 0
  • Feature request: Simpler loading of custom models

    Feature request: Simpler loading of custom models

    Using your toolbox with the built-in models is straightforward, but we would like to compare some custom pytorch models. It would be great to have a routine to add these models (i.e. subclasses of nn.Module) to the toolbox registry from your own script. If this is already possible, it would be great if you could share an example.

    Currently, we add the model inside the toolbox's files which makes extensions complicated and redundant (e.g. name of model in the path, the function name, the plotting routine).

    Thanks David

    enhancement 
    opened by dekuenstle 2
  • BiT models via timm?

    BiT models via timm?

    had difficulties obtaining the BiT models via pytorch image models. I then used:

    e.g.,

    import timm m = timm.create_model('resnetv2_152x12_bitm', pretrained=True)

    in pytorch model_zoo.py.

    This worked perfectly.

    opened by lukasShuber 0
Owner
Bethge Lab
Perceiving Neural Networks
Bethge Lab
Logistic Bandit experiments. Official code for the paper "Jointly Efficient and Optimal Algorithms for Logistic Bandits".

Code for the paper Jointly Efficient and Optimal Algorithms for Logistic Bandits, by Louis Faury, Marc Abeille, Clรฉment Calauzรจnes and Kwang-Sun Jun.

Faury Louis 1 Jan 22, 2022
YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)

Introduction Yolov5-face is a real-time,high accuracy face detection. Performance Single Scale Inference on VGA resolution๏ผˆmax side is equal to 640 an

DeepCam Shenzhen 1.4k Jan 07, 2023
๐Ÿ’ก Learnergy is a Python library for energy-based machine learning models.

Learnergy: Energy-based Machine Learners Welcome to Learnergy. Did you ever reach a bottleneck in your computational experiments? Are you tired of imp

Gustavo Rosa 57 Nov 17, 2022
Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Instrument Recognition.

Music Trees Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Ins

Hugo Flores Garcรญa 32 Nov 22, 2022
ใ€ŠGeo Word Cloudsใ€‹paper implementation

ใ€ŠGeo Word Cloudsใ€‹paper implementation

Russellwzr 2 Jan 28, 2022
Pytorch implementation for A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose

A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose Paper | Website | Data A-NeRF: Articulated Neural Radiance F

Shih-Yang Su 172 Dec 22, 2022
๐Ÿ’Š A 3D Generative Model for Structure-Based Drug Design (NeurIPS 2021)

A 3D Generative Model for Structure-Based Drug Design Coming soon... Citation @inproceedings{luo2021sbdd, title={A 3D Generative Model for Structu

Shitong Luo 118 Jan 05, 2023
UT-Sarulab MOS prediction system using SSL models

UTMOS: UTokyo-SaruLab MOS Prediction System Official implementation of "UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022" submitted to INTERSP

sarulab-speech 58 Nov 22, 2022
Code accompanying the NeurIPS 2021 paper "Generating High-Quality Explanations for Navigation in Partially-Revealed Environments"

Generating High-Quality Explanations for Navigation in Partially-Revealed Environments This work presents an approach to explainable navigation under

RAIL Group @ George Mason University 1 Oct 28, 2022
AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis.

AITom Introduction AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis. AITom is originated from the tomominer l

93 Jan 02, 2023
TuckER: Tensor Factorization for Knowledge Graph Completion

TuckER: Tensor Factorization for Knowledge Graph Completion This codebase contains PyTorch implementation of the paper: TuckER: Tensor Factorization f

Ivana Balazevic 296 Dec 06, 2022
MPViT:Multi-Path Vision Transformer for Dense Prediction

MPViT : Multi-Path Vision Transformer for Dense Prediction This repository inlcu

Youngwan Lee 272 Dec 20, 2022
CM building dataset Timisoara

CM_building_dataset_Timisoara Date created: Febr-2020 The Timi\c{s}oara Building Dataset - TMBuD - is composed of 160 images with the resolution of 76

Orhei Ciprian 5 Sep 07, 2022
Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

CoGAIL Table of Content Overview Installation Dataset Training Evaluation Trained Checkpoints Acknowledgement Citations License Overview This reposito

Jeremy Wang 29 Dec 24, 2022
[ICCV 2021] HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration

HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration Introduction The repository contains the source code and pre-tr

Intelligent Sensing, Perception and Computing Group 55 Dec 14, 2022
This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

Self-Diagnosis and Self-Debiasing This repository contains the source code for Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based

Timo Schick 62 Dec 12, 2022
Riemannian Geometry for Molecular Surface Approximation (RGMolSA)

Riemannian Geometry for Molecular Surface Approximation (RGMolSA) Introduction Ligand-based virtual screening aims to reduce the cost and duration of

11 Nov 15, 2022
Supplementary code for TISMIR paper "Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form"

Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form This is supplementary code for the TISMIR paper Sliding-Window Pitch-Class H

1 Nov 27, 2021
MagFace: A Universal Representation for Face Recognition and Quality Assessment

MagFace MagFace: A Universal Representation for Face Recognition and Quality Assessment in IEEE Conference on Computer Vision and Pattern Recognition

Qiang Meng 523 Jan 05, 2023
The codes of paper 'Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees'

Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees This project contains the codes of pap

0 Apr 20, 2022