kaito - Improve AI/ML Model Management and Deployment in Kubernetes

Introducing Kubernetes AI Toolchain Operator (Kaito)

Kaito is a remarkable tool designed to simplify the management of large AI and machine learning models within Kubernetes clusters. It is specifically tailored to automate the tasks related to both the inference and tuning of popular open-source models, such as falcon and phi-3. By integrating Kaito into a Kubernetes environment, users can enjoy several distinctive benefits compared to traditional virtual machine-based model deployment methods.

Key Features

Container Image Management: Kaito manages large model files using container images, making the process neat and efficient. A built-in HTTP server is used to handle inference calls from the model library.
Preset Configurations: Users don’t have to manually tweak parameters based on GPU hardware specifics, thanks to Kaito's preset configurations.
Automatic GPU Node Provisioning: Depending on model requirements, Kaito can automatically provide the necessary GPU nodes.
Public Hosting: When the license permits, Kaito hosts large model images in the public Microsoft Container Registry (MCR), making access and management more straightforward.

By leveraging Kaito, onboarding large AI inference models in Kubernetes becomes a more streamlined process.

Architectural Overview

Kaito adopts the classic Kubernetes framework of Custom Resource Definitions (CRD) and controllers. Users create a workspace custom resource that spells out the GPU requirements and the necessary inference or tuning specifications. The Kaito controllers then handle the deployment by coordinating these resources.

Workspace Controller: It reconciles the workspace custom resource, triggers GPU node provisioning, and facilitates workload deployments based on preset configurations.
Node Provisioner Controller: Known as gpu-provisioner, this controller integrates with Azure APIs to add GPU nodes to an AKS (Azure Kubernetes Service) cluster. It interacts seamlessly with the workspace controller using the machine CRD.

Installation and Getting Started

Kaito is designed to be installed easily with guidance available for both Azure CLI deployments and Terraform. After installation, starting a service is as simple as running several commands, such as setting up a falcon-7b inference service via a YAML configuration file. Users can then monitor the deployment status and test their inference services with straightforward Kubernetes commands.

Advanced Usage

For those looking to dive deeper, Kaito offers support for custom models, model fine-tuning, and even the use of fine-tuned adapters in inference services. Detailed guides are available for users who wish to deploy their own containerized models or customize configurations beyond the Kaito presets.

Frequently Asked Questions

Kaito addresses common user queries like node labeling for workspace utilization, updating to the latest model configuration, and overriding Kaito’s preset configurations. Users are also educated on the distinction between instruct and non-instruct models, the former being optimized for interactive chat applications.

Contribution and Community

Contributions to Kaito are encouraged, with a straightforward process for submitting patches. Microsoft’s Open Source Code of Conduct provides guidelines for community interaction, ensuring a welcoming environment for collaboration.

Contacts and Support

For further inquiries or support, the Kaito development team can be reached at the provided email contact, ensuring users have access to resources as they navigate the platform.

Through its innovative approach, Kaito offers a comprehensive solution for managing large models in Kubernetes, greatly enhancing efficiency and productivity for AI and ML tasks.