intel-extension-for-transformers
Intel Extension for Transformers improves Transformer model efficiency across platforms such as Intel Gaudi2, CPU, and GPU. Offering seamless Hugging Face API integration for model compression and software optimizations, it enhances models like GPT-J, BLOOM, and T5 for faster inference. The toolkit includes a flexible chatbot framework and expands low-bit inference capabilities, offering robust support for developers working with GenAI/LLM technologies.