modelz-llm - Streamlined Integration of Open Source Large Language Models in Diverse Environments

Introduction to Modelz LLM

Modelz LLM is an inference server that enables individuals and organizations to harness the power of open-source large language models (LLMs) seamlessly. This system can be deployed in both local and cloud environments, and it features an API compatible with OpenAI. This makes it exceptionally convenient for developers looking to work with powerful LLMs without complex configuration processes.

Key Features

OpenAI Compatible API: The compatibility with OpenAI means users can utilize the OpenAI Python SDK or LangChain to communicate with the models directly, making integration straightforward for those already familiar with OpenAI tools.
Self-hosted Deployment: Flexibility is a core advantage here. Modelz LLM can be easily set up and run on either local machines or cloud-based infrastructures. This versatility allows users to choose where and how they want their AI models to operate.
Support for Open Source LLMs: Modelz LLM offers support for well-known open-source language models such as FastChat, LLaMA, and ChatGLM. These models provide a broad range of capabilities and use cases, from casual conversation to complex data analysis.
Cloud Native: To simplify deployment in cloud environments, Modelz LLM provides Docker images, facilitating quick scaling and management on cloud platforms, such as Kubernetes, and available services like Modelz.

Getting Started Quickly

Installation: Installation is made simple with the following command accessible via pip:

pip install modelz-llm
# Alternatively, you can install directly from the source:
pip install git+https://github.com/tensorchord/modelz-llm.git[gpu]

Running the API Server: To start the server locally, you can initiate it with:

modelz-llm -m bigscience/bloomz-560m --device cpu

Model Support

Here are some models supported by Modelz LLM along with their recommended deployment environments:

FastChat T5: Deployed with a Docker image and ideally run on Nvidia L4 (24GB).
Vicuna, LLaMA, ChatGLM Series: These are supported with varying resource recommendations, such as Nvidia A100 (40GB) for larger models.
Bloomz Series: Ranges from Bloomz 560M suitable for CPU environments to Bloomz 7.1B that requires Nvidia A100 (40GB).

Interacting with the Models

Using OpenAI Python SDK: Users can leverage the OpenAI SDK to initiate interactions with the models seamlessly:

import openai
openai.api_base="http://localhost:8000"
openai.api_key="any"

chat_completion = openai.ChatCompletion.create(model="any", messages=[{"role": "user", "content": "Hello world"}])

Integration with LangChain: For those employing LangChain, integration is straightforward, allowing for enhanced functionality like generating recommendations:

from langchain.llms import OpenAI
llm = OpenAI()
llm.generate(prompts=["Could you please recommend some movies?"])

Deploy on Modelz

Deploying the Modelz LLM directly on Modelz is streamlined and supported through well-documented guides and resources. It enhances scalability and simplifies the deployment process in expansive cloud infrastructures.

Supported APIs

Modelz LLM supports multiple APIs for interactions:

/completions
/chat/completions
/embeddings
/engines/<any>/embeddings
And more, ensuring comprehensive functionality.

Acknowledgements

Modelz LLM owes much to foundational projects like FastChat for prompt generation, and Mosec for the inference engine, which contribute significantly to its success and integration capabilities.

In summary, Modelz LLM presents a robust, user-friendly platform that empowers developers and researchers to leverage the strength of open-source LLMs with ease and efficiency, paving the way for innovative applications across industries.