Awesome-LLM-Inference
Explore a comprehensive library of leading-edge papers and code focused on large language model (LLM) inference. It includes topics such as distributed computing, caching methods, and quantization techniques like FP8 and WINT8/4. Designed for researchers and practitioners aiming to improve the efficiency of LLM deployment. The resource covers subjects like Continuous Batching and Parallel Decoding, alongside advancements such as Mooncake and LLM-Viewer. Stay informed on the latest innovations to support your LLM applications.