Mistral Inference: A Comprehensive Introduction
The Mistral Inference project provides a straightforward and effective way to run Mistral models. This project repository is a valuable resource for users interested in deploying various Mistral models, including the highly anticipated Mistral Large 2 and other innovative variants. This guide takes you through the essentials of Mistral Inference, offering clarity on its installation, model usage, and application in different environments.
Installation
To begin with Mistral Inference, you need a system with GPU capabilities due to its dependency on xformers
, which demands GPU installation. The installation can be performed via PyPI:
pip install mistral-inference
Alternatively, for local installation, clone the repository and install it using poetry:
cd $HOME && git clone https://github.com/mistralai/mistral-inference
cd $HOME/mistral-inference && poetry install .
Model Downloads
Mistral Inference supports a variety of model downloads, each tailored for specific purposes, such as instruction following and coding assistance. Models available for download include Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, and others. Each model link is accompanied by an MD5 checksum for verification:
- 7B Instruct
- Mixtral 8x7B Instruct
- Codestral 22B
- Mathstral 7B, among others
It's important to note that these models, especially the larger ones, showcase advanced features like function calling to enhance their capabilities in conversational and instruction-following tasks.
Running the Model
Command-line Interface (CLI)
The Mistral Inference allows model interaction through a command-line interface. Users can test models using the mistral-demo
command:
mistral-demo $12B_DIR
For multi-GPU setups, particularly important with large models like 8x7B and 8x22B, the torchrun
command is utilized:
torchrun --nproc-per-node 2 --no-python mistral-demo $M8x7B_DIR
With the mistral-chat
command, users can engage interactively, providing options such as instruction formatting and token limits:
mistral-chat $12B_DIR --instruct --max_tokens 1024 --temperature 0.35
Python Interface
For a Python-based interaction, Mistral Inference offers powerful interfaces for various tasks, including instruction following and function calling. For example, here's a simple instruction-following script in Python:
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
tokenizer = MistralTokenizer.from_file("./mistral-nemo-instruct-v0.1/tekken.json")
model = Transformer.from_folder("./mistral-nemo-instruct-v0.1")
prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."
completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=1024, temperature=0.35)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)
Deployment and Platforms
The Mistral Inference project supports deployment across various platforms, including Docker-based setups, enabling easy scaling via cloud services. The API and cloud infrastructure ensures broad accessibility for various use cases and environments.
References and Further Reading
To delve deeper into concepts and methodologies behind Mistral models, references such as the LoRA paper by Hu et al. 2021 provide foundational insights into the adaptations used in large language models. Additionally, comprehensive documentation and user guides can be accessed through the official Mistral AI platforms and repositories.
This introduction aims to provide a practical overview of the Mistral Inference project, simplifying the process of engaging with advanced language models for innovative applications in technology and research.