llama3.java
Explore the capabilities of Llama 3 inference in a single Java file, with features such as GGUF parsing and Java's Vector API for enhanced performance. This project supports versions 3.1 and 3.2 with optimized tokenization and quantization models Q8_0 and Q4_0, facilitating advanced compiler testing on the JVM platform. The straightforward setup and native image support enable rapid execution and varied CLI functionalities.