en

#onnx

Access LLaMa and RWKV models in ONNX format to enhance inference efficiency on devices with limited memory. This project bypasses the need for torch or transformers, supports memory pooling, and is compatible with FPGA/NPU/GPGPU hardware, enabling streamlined conversion to fp16 or TVM.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]