Project Icon

llama.onnx

Streamlined Model Inference Using ONNX for LLaMa and RWKV Across Various Devices

Product DescriptionAccess LLaMa and RWKV models in ONNX format to enhance inference efficiency on devices with limited memory. This project bypasses the need for torch or transformers, supports memory pooling, and is compatible with FPGA/NPU/GPGPU hardware, enabling streamlined conversion to fp16 or TVM.
Project Details