ai00_server - Compact LLM API Server Supporting Vulkan-Compatible GPUs

AI00 RWKV Server Project Introduction

The AI00 RWKV Server is an innovative inference API server designed for the RWKV language model, leveraging the web-rwkv inference engine. This server stands out due to its support for Vulkan parallel and concurrent batched inference. One notable advantage is that it operates effectively on all GPUs that support Vulkan, eliminating the requirement for Nvidia hardware. Users with AMD cards or even integrated graphics can benefit from enhanced acceleration.

Key Features

High Performance and Accuracy: Built on the RWKV model, the server ensures exceptional performance and precision.
GPU Acceleration Without CUDA: It supports Vulkan inference acceleration, enabling GPU speed-up without relying on CUDA. This makes it accessible for a wide range of graphics cards, including AMD and integrated ones.
Compact and Ready to Use: There is no need for bulky runtimes like PyTorch or CUDA; the server is lightweight and ready for immediate deployment.
Compatibility with OpenAI's ChatGPT: The server's interface aligns perfectly with OpenAI's ChatGPT API, ensuring seamless integration and usage.

Use Cases

The AI00 RWKV Server is versatile and can be implemented in various applications such as:

Chatbots: Develop intelligent conversational agents.
Text Generation: Create content more efficiently.
Translation: Convert languages accurately and quickly.
Question & Answer Systems: Implement systems that provide precise responses to queries.

Installation and Usage

To get started, users can download pre-built executables directly from the project's release page. After downloading and placing the appropriate model in the 'assets/models' directory, users can run the server using a simple command line operation. For those who prefer building from source, the project is written in Rust, and instructions for compilation are provided.

Model Conversion

AI00 RWKV Server currently supports models in the Safetensors format with a .st extension. Users with models saved in the .pth format using Torch will need to convert them. This can be achieved using a provided Python script or by utilizing an executable available in the release section.

API Support

The server starts its API service on port 65530, following OpenAI's API specifications for data input and output. Users can explore available endpoints, such as models, chat completions, and embeddings, for their projects.

BNF Sampling

Introduced in version 0.5, BNF sampling is a unique feature of AI00 RWKV Server. It limits potential output to adhere to specified formats such as JSON, ensuring consistency and structure in the generated data.

WebUI and Future Enhancements

The AI00 RWKV Server includes a WebUI accessible through a browser, providing a user-friendly interface for interaction and demonstration. The team is continuously working on enhancements, including support for various quantization methods and model tuning.

Join the Community

The project invites contributors and enthusiasts to join their community. Whether through writing code, providing feedback, testing features, or translating documentation, every contribution helps further the project's development.

In summary, the AI00 RWKV Server is a robust, easy-to-use API server offering enhanced GPU support, high accuracy, and a wide range of applications, making it an ideal choice for developers in the AI and machine learning fields.