Optimum Intel: Overview
Optimum Intel serves as a bridge connecting the Hugging Face Transformers and Diffusers libraries with Intel's tools and libraries. It aims to accelerate machine learning pipelines when working with Intel hardware. The project utilizes several Intel technologies to enhance performance, including the Intel Extension for PyTorch, Intel Neural Compressor, and OpenVINO toolkit.
Key Components
Intel Extension for PyTorch
The Intel Extension for PyTorch is an open-source library designed to enhance the performance of PyTorch models. It provides optimizations for both eager and graph modes, with the graph mode generally offering better performance through techniques like operation fusion.
Intel Neural Compressor
This open-source library enables users to apply compression techniques such as quantization, pruning, and knowledge distillation to their models. It supports various quantization methods like static, dynamic, and quantization-aware training, ensuring user-specified accuracy. Different pruning strategies are also supported for achieving specified sparsity goals.
OpenVINO
OpenVINO is a toolkit designed to improve inference performance across Intel hardware, including CPUs and GPUs. It provides tools for model optimization using quantization and pruning. With Optimum Intel, users can optimize their models and convert them into the OpenVINO Intermediate Representation (IR) format for enhanced inference capabilities.
Installation
The installation of Optimum Intel involves the use of pip
, Python's package manager, with different commands depending on the desired accelerator:
- For Intel Neural Compressor:
pip install --upgrade --upgrade-strategy eager "optimum[neural-compressor]"
- For OpenVINO:
pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
- For Intel Extension for PyTorch:
pip install --upgrade --upgrade-strategy eager "optimum[ipex]"
The --upgrade-strategy eager
option ensures that the package is updated to the latest version. It's recommended to use a virtual environment and to update pip
itself before installation.
For those interested in the latest features, installations can also be done from the source via GitHub.
Key Features and Usage
Neural Compressor
Users can leverage dynamic quantization through the command-line interface to optimize models for CPU usage. For example, a model can be quantized using a simple terminal command.
OpenVINO
Optimum Intel allows users to convert models into the OpenVINO IR format for optimized inference. Users can apply low-precision quantization to model weights, maintaining floating-point precision for activations. The framework also supports a hybrid quantization mode for pipelines such as Stable Diffusion.
IPEX (Intel Extension for PyTorch)
Models can be optimized by replacing standard PyTorch classes with IPEX-specific classes, which perform operator-level and graph-level optimizations for improved performance.
Practical Examples
The project repository hosts several examples and notebooks illustrating how to leverage Optimum Intel for model optimization and acceleration. Users are encouraged to follow these examples to gain hands-on experience.
Additional Support for Gaudi
For users with access to Intel Gaudi AI Accelerators, Optimum Habana offers tools for model training and inference on single- and multi-HPU settings. Users can train their models on these accelerators, contributing to the ongoing improvement of AI model efficiency on Intel hardware.
In summary, Optimum Intel provides a comprehensive suite of tools and technologies enabling users to optimize and accelerate machine learning models on Intel hardware, pushing the boundaries of performance and efficiency.