en

#cuBLAS

The project ports RWKV language model architecture to ggml, supporting FP32, FP16, and various quantized inferences like INT4, INT5, and INT8. Primarily CPU-focused, it includes both a C library and a Python wrapper, with optional cuBLAS support. It supports RWKV versions 5 and 6, providing competitive alternatives to Transformer models, especially for extensive contexts, and accommodates LoRA checkpoint integration, offering detailed performance metrics for efficient computations.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]