tensorflow-image-models - Diverse Pretrained Image Models in TensorFlow for Versatile Applications

TensorFlow Image Models

Introduction

TensorFlow Image Models (tfimm) is a comprehensive collection of image models featuring pretrained weights. This project transplants a variety of architectures initially from the timm library, originally developed for PyTorch, over to TensorFlow. These include Vision Transformers like ViT, DeiT, and Swin Transformers; MLP-Mixer models such as ResMLP and PoolFormer; various ResNet adaptations; and more recent advancements like ConvNeXt and Segment Anything. As tfimm evolves, it's anticipated to include an even broader range of models.

The porting of these models is made possible due to the collaborative foundation provided by Ross Wightman's timm library and the efforts enabling PyTorch and TensorFlow interoperability in the transformer repository by Hugging Face. The project creators have made a dedicated effort to credit all original sources and are open to updating acknowledgments if any contributions were inadvertently overlooked.

Usage

The tfimm package can easily be installed using the Python package manager pip. To leverage pretrained weights, users need to install timm separately. Once installed, users can load pretrained models and adjust them for different tasks by configuring the classifier layer. For instance, the number of classes in a model can be set to zero to remove the classification layer or adjusted according to specific needs, with new classification layers initialized if needed.

Here's a brief example of how to use it:

import tfimm

# Create a pretrained model 
model = tfimm.create_model("vit_tiny_patch16_224", pretrained="timm")

# List available models with pretrained weights
print(tfimm.list_models(pretrained="timm"))

All models in tfimm are subclasses of tf.keras.Model, and although they aren't functional models, they can be saved and loaded using TensorFlow's SavedModel format. It is crucial to import the tfimm library before loading a model to ensure compatibility.

Models

The collection includes:

Vision Transformers like CaiT, DeiT, and ViT. These models utilize attention mechanisms for image recognition tasks.
Swin Transformer, a hierarchical transformer that makes use of shifted windows.
MLP-Mixer and related models, using multi-layer perceptrons instead of conventional convolutions.
ConvMixer, an effort to blend convolutional techniques with mixer-style architectures.
EfficientNet family, optimized for performance and speed, accommodating adversarial examples and self-training methods.
MobileNet-V2, emphasizing efficiency and portability in mobile and embedded applications.
Pyramid Vision Transformer, a convolution-free backbone suitable for dense image prediction.
ResNet and its variations, longstanding architectures renowned for their deep residual networks.

Profiling

tfimm includes profiling results for different GPU setups (like K80 and V100) to help users understand model scalability and performance. This includes metrics on maximum batch size fitting in GPU memory and the throughput of processed images per second, both for inference and backpropagation purposes.

License

The tfimm project is open-source, distributed under the Apache 2.0 license, supporting collaborative development and innovation.

Contact

For discussions, updates, and questions about tfimm, users can connect via the dedicated Slack channel.

This introduction provides an overview of the tfimm project's goals, capabilities, and application opportunities, setting the stage for users to leverage its powerful image modeling capabilities in their TensorFlow projects.