model-vs-human - Evaluate Vision Model Performance Differences Using an Open-Source Python Toolkit

modelvshuman: Benchmarking Model Generalization Against Human Vision

Introduction

"modelvshuman" is a Python toolbox designed to evaluate the generalization capabilities of machine vision models against human vision. It serves as a platform for comparing various models by using 17 unique datasets that are out-of-distribution (OOD), accompanied by high-quality human comparison data. This toolkit is compatible with both PyTorch and TensorFlow frameworks, providing an expansive scope for analyzing model performance.

Benchmark

Most Human-like Behavior

The benchmark section categorizes models based on their similarity to human vision. The top models showcase close alignment with human perception, evaluated through metrics like accuracy difference, consistency observed, and error consistency. Models like ViT-22B and CLIP perform exceptionally well, indicating their advanced generalization capabilities.

Highest OOD Distortion Robustness

Another critical aspect of the benchmark is assessing how models respond to distorted data that differ from their training datasets. ViT-22B stands out in this category for its robustness against OOD distortions, showing its capability to handle unexpected changes in data patterns effectively.

Installation

Installing "modelvshuman" is straightforward:

Clone the repository and set the home path via command line.
Use pip to install the package within the cloned repository.

This setup ensures that any updates or new additions to the code, like the inclusion of personal models, are seamlessly integrated.

User Experience

The toolbox provides a user-friendly approach to testing model performance. Users can modify examples/evaluate.py to experiment with models and datasets, and compile reports in PDF format for comprehensive analysis.

Model Zoo

"modelvshuman" includes a diverse array of models:

Over 20 standard supervised models.
Several self-supervised contrastive models.
Vision transformer variants.
Adversarially robust models.
ResNet-50 models with varied stylized training.

These models are easily accessible, allowing users to load and test them on custom or existing datasets, ensuring a smooth model evaluation process.

Loading and Managing Models

Models can be effortlessly loaded via simple Python commands, whether for PyTorch or TensorFlow. The toolbox also supports listing all available models and adding new ones, providing flexibility to expand the model set.

Datasets

A core feature of "modelvshuman" is its comprehensive collection of 17 datasets, all of which were developed under controlled laboratory conditions. These datasets enable extensive testing of model robustness against a variety of parametric, nonparametric, and distortion-based challenges.

Loading Datasets

Datasets can be loaded using straightforward commands. Upon the first evaluation of a model on a new dataset, the required data is automatically downloaded, facilitating ease of use.

Credit & Citation

The psychophysical data used in this project were collected in the Wichmannlab. The data used originate from several key sources and publications, ensuring high-quality resources for evaluating model performance.

For those citing the work, appropriate citation details are provided, crediting the authors and their contributions to the study of machine and human vision comparison.

This project stands as a pivotal tool for researchers looking to bridge the gap between machine vision models and human visual perception, offering an extensive suite of tools for model evaluation and comparison.