LitServe - AI Model Serving Enhanced by FastAPI for Speed and Flexibility

Introducing LitServe

LitServe is a cutting-edge serving engine specifically designed for AI models, offering unmatched speed and flexibility, making AI model deployment easier than ever. Built on FastAPI, LitServe adds powerful features to streamline the AI serving process, such as batching, streaming, and GPU autoscaling. This eliminates the cumbersome task of rebuilding a FastAPI server for each different AI model.

Key Features of LitServe

LitServe is renowned for delivering performance that is at least twice as fast as using plain FastAPI, thanks to its AI-specific multi-worker handling. Here are some standout features that LitServe offers:

Lightning Fast Performance: Outperforms FastAPI with at least a 2x increase in speed.
Flexibility and Ease of Use: Supports a wide array of AI models, including LLMs and non-LLMs.
Comprehensive Model Support: Compatible with popular machine learning frameworks like PyTorch, JAX, and TensorFlow.
GPU Autoscaling and Batch Processing: Automatically scales GPUs and handles requests in batches for optimal performance.
Streaming Capability: Provides streaming data support for real-time applications.
Deployment Options: Users can choose between self-hosting or opting for managed services.
Compatibility with Advanced AI Systems: Easily integrates with systems like vLLM for serving complex AI models.

Getting Started with LitServe

Installation

To begin using LitServe, simply install it via pip:

pip install litserve

Server Setup

Setting up a server is straightforward. Here’s a simple example using two AI models:

import litserve as ls

# Define the AI compound system
class SimpleLitAPI(ls.LitAPI):
    def setup(self, device):
        self.model1 = lambda x: x**2
        self.model2 = lambda x: x**3

    def decode_request(self, request):
        return request["input"] 

    def predict(self, x):
        squared = self.model1(x)
        cubed = self.model2(x)
        return {"output": squared + cubed}

    def encode_response(self, output):
        return {"output": output} 

# Start the server
if __name__ == "__main__":
    server = ls.LitServer(SimpleLitAPI(), accelerator="auto", max_batch_size=1)
    server.run(port=8000)

Testing Your Setup

You can test your server with an auto-generated test client or via a simple curl command:

curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'

Advanced Model Serving

LitServe facilitates not just standard AI model serving but also specializes in serving LLMs (Large Language Models). It allows for seamless integration with platforms like vLLM or allows users to explore APIs such as LitGPT for specific LLM deployments.

Performance and Hosting

LitServe’s architecture is designed for scalability and efficiency, ensuring high-speed performance through advanced features like batching and GPU scaling. It offers flexible hosting options to cater to both individual developers and enterprises. Self-host for personal projects or opt for fully managed deployment for enterprise needs, leveraging features like autoscaling and multi-machine inference on Lightning Studios.

Community and Contributions

LitServe thrives on community contributions and is constantly evolving to become a leading AI inference engine. It is an open-source project under the Apache 2.0 license, welcoming developers to join and improve its capabilities. For support and community interaction, users are encouraged to join the Discord channel.

Get started with LitServe today, and bring your AI models to life with unprecedented speed and efficiency!