Llama-2-Open-Source-LLM-CPU-Inference
Learn how to deploy open-source LLMs such as Llama 2 on CPUs for effective document Q&A in a privacy compliant manner. Utilize tools including C Transformers, GGML, and LangChain to efficiently manage resources, minimizing reliance on expensive GPU usage. The project provides detailed guidance on local CPU inference from setup to query execution, offering a solution that respects data privacy and avoids third-party dependencies.