distributed-llama
The project enables distributed management of large language models (LLMs), optimizing workloads and memory distribution over multiple devices. It facilitates running large models locally on even low-power hardware, utilizing TCP sockets for state synchronization and basic home routers for cluster setup. Compatible with architectures including Llama and Grok, it is optimized for ARM and x86_64 CPUs, with future plans for GPU support. Offering detailed setup guides for Raspberry Pi and common computers, it provides an efficient framework for harnessing tensor parallelism.