wllama
This project integrates WebAssembly with llama.cpp, facilitating AI model inference in browsers without backend or GPU reliance. It offers TypeScript support, high-level APIs for completions and embeddings, and low-level APIs for fine-tuned operations. Recent updates include offline model caching and custom logger support. Models can be loaded in parallel, with inference processes running in a worker to avoid blocking UI interfaces. Demo applications are available on Hugging Face Spaces to showcase its features.