#ImageNet
mtt-distillation
The 'mtt-distillation' project employs a cutting-edge technique in 'Dataset Distillation,' optimizing synthetic images to emulate the training behaviours of genuine datasets. This ensures similar performance during tests. Leveraging expert networks, the project converts synthetic data for tasks such as ImageNet subset synthesis and texture creation, broadening AI model capabilities while conserving resources. The project's scalable solutions are suitable for areas like fashion and targeted image sets due to its tileable textures. Highlighted features include the generation of class-based textures, solid training model frameworks, and integration possibilities with various datasets, enhancing the effectiveness of synthetic dataset usage.
diffusion-classifier
Discover how the Diffusion Classifier utilizes text-to-image models for zero-shot classification, surpassing standard approaches with effective multimodal reasoning. Utilize large-scale models such as Stable Diffusion for superior classification outcomes with no extra training. Suitable for researchers and developers interested in enhancing image classification tasks through conditional density estimates and multimodal reasoning.
RADIO
AM-RADIO integrates models like CLIP, DINOv2, and SAM, enhancing image processing capabilities such as text grounding and segmentation. It improves performance in zero-shot image classification and non-square image handling. E-RADIO, an efficient variant, operates 6-10 times faster, contributing to enhanced vision-language task handling.
fast-DiT
The project provides an improved PyTorch implementation for scalable diffusion models with transformers, focusing on optimizing training and memory efficiency. It features pre-trained class-conditional models on ImageNet (512x512, 256x256) and tools for both sampling and training. Enhancements like gradient checkpointing and mixed precision training lead to notable performance gains. Resources such as Hugging Face Space and Colab notebooks facilitate easy deployment and model training. Evaluation tools support metrics computation like FID and Inception Score for thorough analysis.
Open-MAGVIT2
Open-MAGVIT2 is an open-source auto-regressive image generation model collection replicating Google's MAGVIT-v2 tokenizer, featuring a massive vocabulary. It introduces asymmetric token factorization and improved sub-token interaction for enhanced image quality. The project provides models up to 1.5B parameters and excels in reconstructing performance on 256x256 ImageNet images. By making its codes and models accessible, it promotes innovation and creativity in visual generation.
Stylized-ImageNet
Investigate the method of stylizing ImageNet to increase shape bias in CNNs for better performance. This project offers detailed tools for dataset transformation and model evaluation, supporting both PyTorch and TensorFlow frameworks. Generate in-depth reports to understand model behavior with stylized data and out-of-distribution performance. With a convenient Docker setup, this resource aids in creating Stylized-ImageNet and includes pre-trained CNNs for practical demonstration. Ideal for research focused on enhancing generalization in machine learning.
HorNet
HorNet leverages Recursive Gated Convolution for advanced spatial interactions in vision backbones. These models deliver leading results on ImageNet-1K and ImageNet-22K, supporting various tasks such as image and 3D object classification. The PyTorch-based implementation provides detailed setups and training methods, ensuring seamless integration and scalability across machine learning projects.
MambaOut
Delve into the MambaOut PyTorch models introducing groundbreaking efficiency in vision tasks with models like the Kobe edition. MambaOut excels with 80% ImageNet accuracy using minimal resources, integrating with pytorch-image-models for impressive performance. Understand model architecture, and compare causal and RNN-like attention. Find training insights, validation processes, and explore through Hugging Face Spaces' Gradio demo.
VanillaNet
VanillaNet presents a minimalist approach to neural networks, enhancing efficiency without sacrificing performance. Its architecture reduces complexity by eliminating layers, shortcuts, and attention mechanisms, which results in faster inference speeds. Achieving 81% Top-1 accuracy with 3.59ms latency on 11 layers, VanillaNet outperforms models like ResNet-50 and Swin-S. This approach redefines deep learning models with its optimal balance of speed, accuracy, and simplicity in tasks like detection and segmentation.
SparK
SparK offers an innovative method for applying BERT-style self-supervised pretraining to all types of convolutional neural networks. Compatible with various CNN architectures such as ResNet, this approach minimizes dependencies and advances image classification capabilities. By employing sophisticated masked modeling, SparK-trained CNNs can surpass untrained larger models and challenge Swin-Transformer models. The pretraining shows significant scalability, enhancing all models involved. For detailed analysis and insights into the advantages of generative self-supervised pretraining, refer to our ICLR 2023 Spotlight paper. Additionally, our accessible Colab demos illustrate model reconstruction and conv layer masking issues.
model-vs-human
Discover 'modelvshuman,' a Python toolkit offering benchmarks for the performance gap between human and machine vision. This tool evaluates PyTorch and TensorFlow models across 17 out-of-distribution datasets with human comparison data. Key features include an extensive model zoo with over 20 standard supervised models, self-supervised contrastive models, vision transformers, and adversarially robust models. It provides straightforward installation and management, ideal for researchers and developers assessing model generalization. Determine if models achieve human-like behavior and OOD robustness with precise evaluations.
inceptionnext
InceptionNeXt is a deep learning model integrating the benefits of Inception and ConvNeXt architectures, offering enhanced speed through innovative large kernel depthwise convolution decomposition. Models like InceptionNeXt-T combine the speed of ResNet-50 with the accuracy of ConvNeXt-T. Trained on ImageNet-1K with PyTorch, it provides various models with different parameters and performances. The model supports NVIDIA CUDA and is designed for effective training and inference, making it suitable for efficient image processing and recognition tasks.
SiT
Scalable Interpolant Transformers (SiT) introduce advancements in flow and diffusion-based generative modeling. Built on Diffusion Transformers (DiT), SiT connects distributions with flexible design choices. This repository includes PyTorch models, pre-trained weights, and a sampling script, designed to perform well on the ImageNet 256x256 benchmark. It is suitable for professionals exploring generative model technologies.
mar
This article objectively describes a method for autoregressive image generation without vector quantization. It includes a PyTorch implementation and class-conditional MAR models trained on ImageNet 256x256, as well as the DiffLoss method. Resources and scripts provided in the repository aid in training and evaluation using PyTorch DDP, with detailed documentation and a Colab notebook demonstration. Ideal for developers and researchers in image generation, it offers tools, models, and visual demos to explore this advanced image synthesis technique.
keras_cv_attention_models
Discover advanced models for image and text recognition and segmentation using Keras' cv_attention_models. The library includes a wide selection of model architectures like CoAtNet, EfficientNet, and YOLO, facilitating model customization and backend conversion with PyTorch support. Seamlessly integrates with TensorFlow, making it ideal for tasks such as ImageNet training and evaluation. Suitable for researchers and developers aiming to enhance their AI initiatives with leading-edge technologies.
hiera
Hiera emerges as a streamlined vision transformer, offering superior performance in image and video tasks through fast inference and MAE pretraining. This model is available on platforms such as Torch Hub and Hugging Face Hub, enabling seamless integration into various projects.
poolformer
Discover the capabilities of the MetaFormer architecture in vision tasks through PoolFormer, which leverages simple pooling for token mixing to outperform advanced transformers and MLP models. This project emphasizes straightforward design while achieving high accuracy on datasets such as ImageNet-1K. Find comprehensive resources including implementations, training scripts, model evaluations, and downloadable pretrained models, along with visualization tools to explore activation patterns in models like PoolFormer, DeiT, and ResNet. Ideal for those interested in simplifying computer vision models without sacrificing performance.
rcg
This PyTorch-based self-supervised framework excels in generating unconditional images at 256x256 resolution on ImageNet. It closes the traditional gap between unconditional and class-conditional generation, enhancing self-representation generation techniques. Latest updates feature enhanced FID evaluation via the ADM suite and new training scripts for DiT-XL with RCG. Utilizing GPUs for efficient training, the framework also offers pre-trained weights and flexible customization options with various pixel generators such as MAGE, DiT, ADM, and LDM. Visit the project's repository for detailed setup and evaluation guidance for image generation projects.
Diffusion_models_from_scratch
Explore diffusion models using DDPM and DDIM methodologies enhanced with Classifier-Free guidance on ImageNet 64x64. Gain practical insights into algorithmic improvements, fast inference, and environment setup for effective model training. Access pre-trained models, experiment with low FID image generation, and optimize training using GPUs with guidance on environment and data management. Ideal for advancing AI image generation skills.
Feedback Email: [email protected]