Project Icon

metaformer

Models for Vision Tasks with High Accuracy Using MetaFormer Architectures

Product DescriptionThis objective overview details several MetaFormer architectural baselines implemented in PyTorch that perform with high accuracy on ImageNet-1K. Models such as IdentityFormer, RandFormer, ConvFormer, and CAFormer utilize unique token mixers like identity mapping and global random mixing. CAFormer achieves 85.5% accuracy at 224x224 resolution in a standard training setup. Learn how these architectures assist in optimizing vision tasks, with integration in the timm library for improved machine vision capabilities.
Project Details