Project Icon

distributed-llama

Enable Scalable LLM Execution by Leveraging Tensor Parallelism Across Various Devices

Product DescriptionThe project enables distributed management of large language models (LLMs), optimizing workloads and memory distribution over multiple devices. It facilitates running large models locally on even low-power hardware, utilizing TCP sockets for state synchronization and basic home routers for cluster setup. Compatible with architectures including Llama and Grok, it is optimized for ARM and x86_64 CPUs, with future plans for GPU support. Offering detailed setup guides for Raspberry Pi and common computers, it provides an efficient framework for harnessing tensor parallelism.
Project Details