LLamaSharp
LLamaSharp is a versatile library offering efficient inference of LLaMA and LLaVA models across platforms on local devices, leveraging CPU and GPU capabilities. Its high-level APIs and RAG support facilitate seamless integration of large language models. With a variety of backends such as CUDA and Vulkan, LLamaSharp eases deployment without requiring native library compilation. It integrates well with libraries like semantic-kernel, and its comprehensive documentation assists in developing AI solutions.