Sequoia
Explore a scalable and robust framework for speculative decoding with a focus on hardware optimization. The article provides detailed guidance on setting up the environment, conducting evaluations, and utilizing various models. Instructions for generating acceptance rate vectors and growmaps include precise commands and configurations. Supporting multiple Llama models, the project ensures adaptability and efficiency. Insights into offloading and sequence length adjustments enhance decoder performance optimization.