Introduction to Hummingbird
Hummingbird is an innovative library designed to empower users by transforming traditional machine learning models into advanced tensor computations. With this capability, users can harness the full potential of neural network frameworks such as PyTorch, significantly enhancing the performance of their machine learning models. This approach offers several benefits, including access to extensive current and future optimizations found in neural network frameworks, native hardware acceleration, and a unified platform for managing both traditional and neural network models without the need for model re-engineering.
Hummingbird allows the conversion of traditional machine learning models into various formats including PyTorch, TorchScript, ONNX, and TVM. It accommodates a broad range of models and features, supporting popular tools like scikit-learn, LightGBM, and XGBoost, with plans to include more in the future. The library also provides a seamless API experience that mirrors the familiar scikit-learn interface, enabling smooth integration without altering existing inference code. Additionally, once converted, models can be served efficiently using TorchServe.
How Hummingbird Works
Hummingbird efficiently reorganizes algorithmic operators to facilitate more streamlined computations, creating conditions conducive to vectorized and GPU execution. One of Hummingbird's strategies includes converting decision trees into tensor operations involving Generic Matrix Multiplication (GEMM). This is accomplished through a series of transformations that interpret the decision tree's structure into tensor computations, ultimately compiling a tree-based model with substantial performance gains.
Installation
Hummingbird is compatible with Python versions 3.9, 3.10, and 3.11 across major operating systems including Linux, Windows, and MacOS. To install, it is recommended to first set up a Python virtual environment. The library requires PyTorch version 1.6.0 or higher, which can be installed following the specific instructions for your system on the PyTorch website. Once PyTorch is installed, Hummingbird itself can be added via pip, with options to include additional dependencies like LightGBM and XGBoost if needed.
python -m pip install hummingbird-ml
python -m pip install hummingbird-ml[extra]
Practical Usage
Hummingbird is highly user-friendly with simple syntax. Whether running a scikit-learn random forest model on a deep neural network framework or executing predictions on either CPU or GPU, the process is straightforward. Users simply need to import Hummingbird, convert their model with a command, and follow through with prediction operations, saving or loading models as required. An example workflow includes creating and training a model, converting it using Hummingbird, and executing predictions as illustrated below:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from hummingbird.ml import convert, load
# Create some random data for binary classification
num_classes = 2
X = np.random.rand(100000, 28)
y = np.random.randint(num_classes, size=100000)
# Create and train a model
skl_model = RandomForestClassifier(n_estimators=10, max_depth=10)
skl_model.fit(X, y)
# Convert the model to PyTorch
model = convert(skl_model, 'pytorch')
# Run predictions
model.predict(X)
Hummingbird provides a comprehensive documentation and welcomes contributions from the community, offering a platform for collaborative development. For those interested in diving deeper into its mechanisms or contributing, extensive resources and community support are readily available.