multi-model-server - Adaptable Platform for Deploying Deep Learning Models Within Various ML/DL Frameworks

Introduction to the Multi Model Server (MMS)

What is Multi Model Server?

Multi Model Server (MMS) is a versatile tool designed to serve deep learning models created with any machine learning or deep learning framework. It offers an easy and efficient way to set up services that handle model inference requests via HTTP endpoints. This server allows developers to deploy their models on various platforms swiftly and with minimal hassle.

Getting Started with MMS

Prerequisites

Before jumping into the installation, ensure the following elements are ready on your machine:

Operating System: Linux (Ubuntu, CentOS) or macOS. Windows support is currently experimental.
Python: Essential for running the workers in MMS.
pip: Python's package management system for installing necessary libraries.
Java 8: Required for starting MMS.

Setting Up the Environment

Start by setting up a virtual environment to maintain isolation from the rest of your system, making dependency management easier. Using Virtualenv:

Install Virtualenv using pip:
```
pip install virtualenv
```

Create and activate a virtual environment:

virtualenv -p /usr/local/bin/python2.7 /tmp/pyenv2
source /tmp/pyenv2/bin/activate

Installing MMS

With the virtual environment in place, install the necessary components:

Step 1: Install MXNet. This isn't installed by default:

pip install mxnet-mkl  # For CPU
pip install mxnet-cu92mkl  # For GPU

Step 2: Install or upgrade MMS:

pip install multi-model-server

Serving a Model with MMS

To serve a model using MMS, a few simple commands suffice. After installation, run the following to start the server:

multi-model-server --start --models squeezenet=https://s3.amazonaws.com/model-server/model_archive_1.0/squeezenet_v1.1.mar

This starts MMS on your host, ready to handle inference requests. You can then test the setup with a sample image of a cat, using curl to send a request to the server's predict endpoint and receive a JSON response with prediction results.

Creating a Model Archive

MMS enables packaging of models into a single archive, making distribution and deployment more straightforward. Detailed instructions are available in the Model Archiver Documentation.

Best Practices for Production Deployment

When deploying MMS in a production environment, consider the following:

Security: Use an authentication proxy in front of MMS, and deploy behind a firewall to protect against attacks.
Network Configuration: By default, MMS only allows localhost access. Configure SSL for secure communications.
Docker: For enhanced security and ease of deployment, consider running MMS inside a Docker container.

Additional Features and Community Contributions

MMS comes with a comprehensive set of documentation, detailing examples, API customization, and other advanced features. Developers are encouraged to contribute to the project by filing issues or submitting pull requests on GitHub.

Dive into external demos like Product Review Classification, Visual Search, and Facial Emotion Recognition to see MMS in action! For ongoing support and collaboration, join the active community on the MMS Slack Channel.