TorchConv KAN: A Comprehensive Overview
Overview
The TorchConv KAN project is an innovative collection of convolutional Kolmogorov-Arnold Networks, commonly referred to as KAN. It aims to provide a robust framework for training, validating, and quantizing convolutional KAN models using PyTorch, enhanced with CUDA for improved performance. The project evaluates its models on widely recognized datasets including MNIST, CIFAR, TinyImagenet, and Imagenet1k.
Current Status
The project is currently under development and has seen a series of updates:
- As of May 2024, various convolutional layers have been introduced, and significant models such as ResNet-like, U-net-like, VGG-like, and DenseNet-like structures have been developed with accelerate-based training code. A range of convolutional layers like WavKAN and new optimizers like the Lion optimizer have also been added.
- By June 2024, additional layers such as JacobiKAN and BernsteinKAN were implemented, along with support for the LBFGS optimizer. Pretrained models, such as VGG11 with Bottleneck Gram Convolutions, now provide validated accuracy results on datasets like Imagenet1k.
- The TorchConv KAN paper has been released, presenting detailed insights on convolutional design principles and empirical studies.
Key Features
Introducing Convolutional KAN Layers
Kolmogorov-Arnold Networks (KAN) leverage the Kolmogorov-Arnold representation theorem to construct innovative convolutional networks. Unlike traditional models that rely on fixed non-linearities, KANs utilize learnable activations at the edges of the network. This approach allows for greater flexibility and potential performance improvements, especially given that the convolutions in these models are defined by a set of univariate non-linear functions instead of fixed weights.
Bottleneck Convolutional KAN Layers
The project introduces bottleneck layers to address the increasing parameter complexity associated with high-dimensional data. These layers apply 1x1 convolutions to reduce and then expand the dimensionality of the input, thus maintaining efficiency while allowing for extensive learnability.
Model Collection
ResKANets and DenseKANets
These models adapt the ResNet and DenseNet architectures by replacing traditional convolutional layers with KAN-based ones. After numerous experiments, these models demonstrated promising results on datasets like CIFAR10 and Tiny Imagenet, although they are subject to further exploration and optimization.
VGGKAN
Similarly, VGGKAN models incorporate KAN convolutions within a VGG-like framework. These models have been pretrained on Imagenet1k, achieving noteworthy accuracy and AUC scores.
Performance Metrics
Performance evaluations show that Kolmogorov-Arnold convolutional networks outperform traditional architectures on simpler datasets like MNIST, but they face challenges on more complex datasets such as CIFAR. Ongoing work aims to refine these models for improved performance.
Future Directions
The project aims to further refine its existing models while exploring potential applications across various datasets and domains. Key future tasks include finetuning experiments, PEFT (Parameter Efficient Fine-Tuning) methods, and the development of pruning and visualization techniques.
Usage and Setup
To use TorchConv KAN, ensure you have Python 3.9+, CUDA, and cuDNN installed. The repository provides scripts for training and testing on common datasets like MNIST and CIFAR. Additionally, accelerate-based training scripts and configurations with Weights & Biases (wandb) for experiment monitoring are available.
For those interested in contributing, the project remains open to community input, with clear instructions for replicating and extending current findings.
In sum, TorchConv KAN represents a significant step forward in the field of neural network architectures, combining novel theoretical underpinnings with practical implementations aimed at tackling complex machine learning tasks.