#Flash Attention

Logo of x-transformers
x-transformers
x-transformers offers an efficient implementation of transformers with features like Flash Attention and memory augmentation. It is suitable for tasks in NLP and computer vision, optimizing performance and resource use, without over-promising results.
Logo of ThunderKittens
ThunderKittens
ThunderKittens streamlines creating high-performance deep learning kernels with CUDA, soon supporting MPS and ROCm. It focuses on simplicity, extensibility, and performance to optimize tile manipulation specific to modern GPU architectures. Key features include tensor core optimization, asynchronous copy techniques to reduce latency, and distributed shared memory usage for efficient GPU usage. Supporting CUDA 12.3+ and C++20, ThunderKittens is powerful yet straightforward to incorporate, offering pre-built PyTorch kernels and an active developer community.
Logo of PaLM-rlhf-pytorch
PaLM-rlhf-pytorch
The project demonstrates the implementation of Reinforcement Learning with Human Feedback (RLHF) on the PaLM infrastructure, enabling researchers to explore open-source systems similar to ChatGPT. It provides guidelines on using the PaLM framework, training reward models with human input, and integrating RLHF for improved performance. The contributions of CarperAI and support from Hugging Face are acknowledged, as well as potential enhancements like Direct Preference Optimization.
Logo of PointTransformerV3
PointTransformerV3
PointTransformerV3 offers an efficient approach to 3D point cloud segmentation, providing improved speed and accuracy in semantic segmentation tasks on benchmarks like nuScenes and ScanNet. The project is continually updated in Pointcept v1.5, supplying valuable resources such as model weights and experiment records. Selected for oral presentation at CVPR'24, it utilizes Flash Attention to enhance computational efficiency and support scalable multi-dataset 3D representation learning.
Logo of contrastors
contrastors
Explore a comprehensive toolkit for contrastive learning that ensures efficient training with Flash Attention and multi-GPU features. Utilize GradCache to handle large batch sizes and delve into Masked Language Modeling pretraining. The toolkit includes Matryoshka Representation Learning for adaptable embedding sizes and supports CLIP and LiT models, along with Vision Transformers. Tailored for researchers with access to 'nomic-embed-text-v1' dataset and pretrained models, it enables effective training and fine-tuning of vision-text models. Engage with the Nomic Community for additional collaboration and insights.
Logo of GPT-2
GPT-2
Delve into the complexities of GPT-2, including its architecture and unique configurations. This overview examines crucial elements such as model files, reproducibility challenges, embedding details, and layer normalization. Learn about essential concepts like weight decay, gradient accumulation, and data parallelism, along with common pitfalls and debugging strategies. Perfect for AI researchers and developers aiming to enhance training effectiveness and comprehend language model intricacies.