LLMLingua
LLMLingua provides efficient tools for compressing prompts in large language models, enabling up to 20x compression with minimal performance degradation. It integrates seamlessly with frameworks such as LangChain and LlamaIndex, optimizing costs and enhancing retrieval-augmented generation performance. LLMLingua-2 further improves task-agnostic compression, offering 3x-6x speed boosts. The latest release includes MInference, which cuts inference latency by as much as 10x in long-context applications, contributing to advancements in AI prompt compression.