mistral-inference - Streamline AI Model Execution with Mistral Inference's Simplified Setup

Mistral Inference: A Comprehensive Introduction

The Mistral Inference project provides a straightforward and effective way to run Mistral models. This project repository is a valuable resource for users interested in deploying various Mistral models, including the highly anticipated Mistral Large 2 and other innovative variants. This guide takes you through the essentials of Mistral Inference, offering clarity on its installation, model usage, and application in different environments.

Installation

To begin with Mistral Inference, you need a system with GPU capabilities due to its dependency on xformers, which demands GPU installation. The installation can be performed via PyPI:

pip install mistral-inference

Alternatively, for local installation, clone the repository and install it using poetry:

cd $HOME && git clone https://github.com/mistralai/mistral-inference
cd $HOME/mistral-inference && poetry install .

Model Downloads

Mistral Inference supports a variety of model downloads, each tailored for specific purposes, such as instruction following and coding assistance. Models available for download include Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, and others. Each model link is accompanied by an MD5 checksum for verification:

7B Instruct
Mixtral 8x7B Instruct
Codestral 22B
Mathstral 7B, among others

It's important to note that these models, especially the larger ones, showcase advanced features like function calling to enhance their capabilities in conversational and instruction-following tasks.

Running the Model

Command-line Interface (CLI)

The Mistral Inference allows model interaction through a command-line interface. Users can test models using the mistral-demo command:

mistral-demo $12B_DIR

For multi-GPU setups, particularly important with large models like 8x7B and 8x22B, the torchrun command is utilized:

torchrun --nproc-per-node 2 --no-python mistral-demo $M8x7B_DIR

With the mistral-chat command, users can engage interactively, providing options such as instruction formatting and token limits:

mistral-chat $12B_DIR --instruct --max_tokens 1024 --temperature 0.35

Python Interface

For a Python-based interaction, Mistral Inference offers powerful interfaces for various tasks, including instruction following and function calling. For example, here's a simple instruction-following script in Python:

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage

tokenizer = MistralTokenizer.from_file("./mistral-nemo-instruct-v0.1/tekken.json")
model = Transformer.from_folder("./mistral-nemo-instruct-v0.1")

prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."
completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=1024, temperature=0.35)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)

Deployment and Platforms

The Mistral Inference project supports deployment across various platforms, including Docker-based setups, enabling easy scaling via cloud services. The API and cloud infrastructure ensures broad accessibility for various use cases and environments.

References and Further Reading

To delve deeper into concepts and methodologies behind Mistral models, references such as the LoRA paper by Hu et al. 2021 provide foundational insights into the adaptations used in large language models. Additionally, comprehensive documentation and user guides can be accessed through the official Mistral AI platforms and repositories.

This introduction aims to provide a practical overview of the Mistral Inference project, simplifying the process of engaging with advanced language models for innovative applications in technology and research.