TensorFlow Image Models
Introduction
TensorFlow Image Models (tfimm
) is a comprehensive collection of image models featuring pretrained weights. This project transplants a variety of architectures initially from the timm
library, originally developed for PyTorch, over to TensorFlow. These include Vision Transformers like ViT, DeiT, and Swin Transformers; MLP-Mixer models such as ResMLP and PoolFormer; various ResNet adaptations; and more recent advancements like ConvNeXt and Segment Anything. As tfimm
evolves, it's anticipated to include an even broader range of models.
The porting of these models is made possible due to the collaborative foundation provided by Ross Wightman's timm
library and the efforts enabling PyTorch and TensorFlow interoperability in the transformer
repository by Hugging Face. The project creators have made a dedicated effort to credit all original sources and are open to updating acknowledgments if any contributions were inadvertently overlooked.
Usage
The tfimm
package can easily be installed using the Python package manager pip
. To leverage pretrained weights, users need to install timm
separately. Once installed, users can load pretrained models and adjust them for different tasks by configuring the classifier layer. For instance, the number of classes in a model can be set to zero to remove the classification layer or adjusted according to specific needs, with new classification layers initialized if needed.
Here's a brief example of how to use it:
import tfimm
# Create a pretrained model
model = tfimm.create_model("vit_tiny_patch16_224", pretrained="timm")
# List available models with pretrained weights
print(tfimm.list_models(pretrained="timm"))
All models in tfimm
are subclasses of tf.keras.Model
, and although they aren't functional models, they can be saved and loaded using TensorFlow's SavedModel
format. It is crucial to import the tfimm
library before loading a model to ensure compatibility.
Models
The collection includes:
- Vision Transformers like CaiT, DeiT, and ViT. These models utilize attention mechanisms for image recognition tasks.
- Swin Transformer, a hierarchical transformer that makes use of shifted windows.
- MLP-Mixer and related models, using multi-layer perceptrons instead of conventional convolutions.
- ConvMixer, an effort to blend convolutional techniques with mixer-style architectures.
- EfficientNet family, optimized for performance and speed, accommodating adversarial examples and self-training methods.
- MobileNet-V2, emphasizing efficiency and portability in mobile and embedded applications.
- Pyramid Vision Transformer, a convolution-free backbone suitable for dense image prediction.
- ResNet and its variations, longstanding architectures renowned for their deep residual networks.
Profiling
tfimm
includes profiling results for different GPU setups (like K80 and V100) to help users understand model scalability and performance. This includes metrics on maximum batch size fitting in GPU memory and the throughput of processed images per second, both for inference and backpropagation purposes.
License
The tfimm
project is open-source, distributed under the Apache 2.0 license, supporting collaborative development and innovation.
Contact
For discussions, updates, and questions about tfimm
, users can connect via the dedicated Slack channel.
This introduction provides an overview of the tfimm
project's goals, capabilities, and application opportunities, setting the stage for users to leverage its powerful image modeling capabilities in their TensorFlow projects.