llama.onnx
Access LLaMa and RWKV models in ONNX format to enhance inference efficiency on devices with limited memory. This project bypasses the need for torch or transformers, supports memory pooling, and is compatible with FPGA/NPU/GPGPU hardware, enabling streamlined conversion to fp16 or TVM.