truss - Streamline AI/ML Model Serving for Seamless Production Deployment

Introduction to Truss

Overview

Truss is an innovative tool designed to streamline the deployment of AI/ML models into production environments. It tackles the complexities of serving models efficiently and effectively, inspired by the "write once, run anywhere" philosophy. By packaging model code, data weights, and dependencies together, Truss ensures a seamless transition from development to production without facing deployment discrepancies.

Key Features

Write Once, Run Anywhere: Truss allows developers to consolidate everything their model needs into a single package. Whether in development or production, the model server behaves consistently, reducing the friction typically encountered when moving a model to production.
Fast Developer Feedback Loop: By eliminating the need for Docker or Kubernetes configurations, Truss provides an all-inclusive serving environment. This feature speeds up the development process, allowing developers to see the impact of their changes almost instantly via a live reload server.
Framework Agnostic: Truss supports a wide range of Python ML frameworks including popular ones like transformers, PyTorch, and TensorFlow. This makes it versatile enough to handle models from basic to complex architectures.

Examples and Use Cases

Truss showcases its ability by supporting well-known models such as:

Llama 2 series, aimed at different parameter scales from 7B to 70B.
Stable Diffusion XL for art generation.
Whisper for speech transcription.

These examples, alongside dozens more, illustrate Truss’s flexibility and power in action.

Installation and Getting Started

To install Truss, a single command using pip suffices:

pip install --upgrade truss

Quickstart Guide

Here's a brief guide to start using Truss with a text classification model:

Create a Truss: Initialize a Truss environment for a text classification task by using the command:
```
truss init text-classification
```
After naming your Truss (e.g., "Text Classification"), navigate into the new directory:
```
cd text-classification
```

Implement the Model: In model/model.py, create a Model class incorporating two functions: load() and predict(). The former prepares the model for operation, while the latter handles inference:

from transformers import pipeline

class Model:
    def __init__(self, **kwargs):
        self._model = None

    def load(self):
        self._model = pipeline("text-classification")

    def predict(self, model_input):
        return self._model(model_input)

Add Dependencies: Define necessary dependencies in config.yaml. For a text classification pipeline, you might specify:
```
requirements:
  - torch==2.0.1
  - transformers==4.30.0
```

Deployment

Truss is supported by Baseten, which provides infrastructure for deploying models. To deploy your model using Baseten:

Get an API Key: Sign up on Baseten if you don't have an account. Generate an API key from the account settings.
Deployment Command: With the API key ready, execute the following to deploy:
```
truss push
```

Monitor the deployment via the Baseten model dashboard.

Invoking the Model

Post-deployment, you can run predictions through the terminal:

Invocation Command:
```
truss predict -d '"Truss is awesome!"'
```

Expected Response:

[
  {
    "label": "POSITIVE",
    "score": 0.999873161315918
  }
]

Community and Contribution

Truss is supported by Baseten and has been developed with contributions from various ML engineers globally. Notable contributors include Stephan Auerhahn and Daniel Sarfati. Community contributions are welcome, following the project's contributing guide and code of conduct.

Truss stands out as a versatile and efficient solution, simplifying the complexities of deploying and managing AI/ML models in production settings, empowering developers to focus on innovation and refinement.