Project Icon

cutlass

Highly Efficient CUDA Framework for Matrix Multiplication with Comprehensive Mixed-Precision Capabilities

Product DescriptionCUTLASS 3.6.0 provides a versatile framework for CUDA matrix operations with modular templates and the CuTe library facilitating efficient tensor manipulation. It accommodates mixed-precision computations such as FP16, BF16, and TF32, optimized for NVIDIA platforms from Volta to Hopper. Updates feature structured sparse GEMM improvements, a refined convolution API, and expanded support for additional data types and architectures, promoting exceptional performance and wide compatibility.
Project Details