Project Icon

Medusa

Medusa Framework for Accelerating LLM Generation Using Multiple Decoding Heads

Product DescriptionMedusa provides a framework for enhancing the efficiency of LLMs through the use of multiple decoding heads, achieving up to 3.6x acceleration. It maintains the original model unchanged and employs a tree-based attention mechanism to improve non-greedy generation, overcoming speculative decoding inefficiencies. This system supports parameter-efficient training, making it accessible for limited GPU setups. Recent updates include Medusa-2 for full-model training and self-distillation features that enable integration with various fine-tuned LLMs without needing original training data.
Project Details