en

#Tensor Parallelism

distributed-llama

The project enables distributed management of large language models (LLMs), optimizing workloads and memory distribution over multiple devices. It facilitates running large models locally on even low-power hardware, utilizing TCP sockets for state synchronization and basic home routers for cluster setup. Compatible with architectures including Llama and Grok, it is optimized for ARM and x86_64 CPUs, with future plans for GPU support. Offering detailed setup guides for Raspberry Pi and common computers, it provides an efficient framework for harnessing tensor parallelism.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]