rwkv.cpp
The project ports RWKV language model architecture to ggml, supporting FP32, FP16, and various quantized inferences like INT4, INT5, and INT8. Primarily CPU-focused, it includes both a C library and a Python wrapper, with optional cuBLAS support. It supports RWKV versions 5 and 6, providing competitive alternatives to Transformer models, especially for extensive contexts, and accommodates LoRA checkpoint integration, offering detailed performance metrics for efficient computations.