Introduction to TensorFlow Serving
TensorFlow Serving is a versatile and efficient system designed for deploying machine learning models in production environments. Focused on handling the inference phase of machine learning, TensorFlow Serving efficiently manages models post-training. It allows for the seamless access and use of these models through a version-controlled lookup system that operates with high-performance standards. Although it is optimally integrated to serve TensorFlow models, it can be extended to accommodate other model types and data requirements.
Key Features
- Multi-Model Support: Capable of managing multiple models or multiple versions of a single model at the same time.
- Endpoint Compatibility: Offers both gRPC and HTTP endpoints for model inference.
- Seamless Updates: New model versions can be deployed without necessitating changes in the client code.
- Testing Capabilities: Supports canary deployments and A/B testing to trial new and experimental models.
- Low Latency: Adds minimal delay due to its efficient low-overhead design.
- Batch Scheduling: Groups inference requests into batches for execution on GPUs, with configurable latency settings.
- Diverse Model Support: Compatible with TensorFlow models, embeddings, vocabularies, feature transformations, and even non-TensorFlow models.
Quick Start: Serving a TensorFlow Model in 60 Seconds
To quickly deploy a TensorFlow model with TensorFlow Serving, follow a few simple steps:
- Download the TensorFlow Serving Docker image and repository:
docker pull tensorflow/serving git clone https://github.com/tensorflow/serving
- Start a TensorFlow Serving container and open the REST API port:
docker run -t --rm -p 8501:8501 \ -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \ -e MODEL_NAME=half_plus_two \ tensorflow/serving &
- Use the predict API to query the model:
This command returns predictions corresponding to the model inputs.curl -d '{"instances": [1.0, 2.0, 5.0]}' \ -X POST http://localhost:8501/v1/models/half_plus_two:predict
Comprehensive Tutorials and Documentation
For a more detailed guide on training and serving a TensorFlow model, visit the official TensorFlow documentation.
Setting Up
The simplest method for using TensorFlow Serving is through Docker:
- Install TensorFlow Serving using Docker (Recommended)
- Install TensorFlow Serving without Docker (Not Recommended)
- Build TensorFlow Serving from Source
- Deploy on Kubernetes
Usage
To serve a TensorFlow model, first export it as a SavedModel. This format is designed for easy consumption and transformation by various systems. For more details on exporting models, refer to the TensorFlow guide.
Extending Functionality
TensorFlow Serving is highly modular, allowing users to extend its capabilities for new use cases:
- Acquaint yourself with building TensorFlow Serving.
- Understand its architecture.
- Explore the C++ API reference.
- Develop new types of servable models or custom sources for model versions.
Contribution and Further Information
Those interested in contributing to TensorFlow Serving should review the contribution guidelines.
For additional information, please visit the official TensorFlow website.