Project Icon

rwkv.cpp

Optimize Language Model for CPU with FP32, FP16, and Various Quantized Formats

Product DescriptionThe project ports RWKV language model architecture to ggml, supporting FP32, FP16, and various quantized inferences like INT4, INT5, and INT8. Primarily CPU-focused, it includes both a C library and a Python wrapper, with optional cuBLAS support. It supports RWKV versions 5 and 6, providing competitive alternatives to Transformer models, especially for extensive contexts, and accommodates LoRA checkpoint integration, offering detailed performance metrics for efficient computations.
Project Details