Introducing the llama_ros Project
The llama_ros project offers a comprehensive set of ROS 2 packages designed to seamlessly integrate llama.cpp into the ROS 2 ecosystem. This integration empowers developers to harness the robust optimization features of llama.cpp in their ROS 2 projects. By utilizing llama_ros, users can effectively run GGUF-based language models (LLMs) and vision-language models (VLMs). Moreover, users have access to functionalities like GBNF grammars and the capability to adjust LoRAs (Low-Rank Approximations) in real-time.
Related Projects
- chatbot_ros: This project integrates a chatbot within the ROS 2 framework, using whisper_ros to interpret spoken language and llama_ros for generating responses. The chatbot's actions are dictated by a state machine built using YASMIN.
- explainable_ros: It provides a tool in ROS 2 for explaining robot behavior. Using LangChain integration, logs are stored in a vector database, enabling users to retrieve relevant logs for queries answered with llama_ros.
Installation Guidelines
To execute llama_ros with CUDA support, one must initially install the CUDA Toolkit. Following installation, llama_ros can be compiled with CUDA support, enhancing its functionality for compatible systems.
$ cd ~/ros2_ws/src
$ git clone https://github.com/mgonzs13/llama_ros.git
$ pip3 install -r llama_ros/requirements.txt
$ cd ~/ros2_ws
$ rosdep install --from-paths src --ignore-src -r -y
$ colcon build --cmake-args -DGGML_CUDA=ON
Docker Support
llama_ros provides a Docker setup for users who prefer containerized environments. Users can build the llama_ros Docker image with options for CUDA support, provided they have the necessary NVIDIA toolkit installed.
$ DOCKER_BUILDKIT=0 docker build -t llama_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .
$ docker run -it --rm --gpus all llama_ros
Usage Overview
llama_cli
A set of commands is available within llama_ros to enhance the testing of GGUF-based models within the ROS 2 environment. For example, using the launch
command, users can deploy a language model from a configuration file.
$ ros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/StableLM-Zephyr.yaml
Launch Files
The launch files are crucial for specifying the parameters and models needed for running llama_ros or llava_ros. These files can be crafted in Python or YAML format.
Example (Python Launch):
from launch import LaunchDescription
from llama_bringup.utils import create_llama_launch
def generate_launch_description():
return LaunchDescription([
create_llama_launch(
model_repo = "TheBloke/Marcoroni-7B-v3-GGUF",
model_filename = "marcoroni-7b-v3.Q4_K_M.gguf",
system_prompt_type = "Alpaca"
)
])
Advanced Features
LoRA Adapters
llama_ros enables the inclusion of LoRA adapters, allowing users to modify models in real-time and scale adaptations as needed for specific tasks.
ROS 2 Clients
The project provides interfaces for using its services within ROS 2 nodes. Clients can tokenize text, generate embeddings, or formulate responses, extending the model's utility in robotics applications.
LangChain Integration
With llama_ros, users can integrate LangChain to employ advanced prompt engineering and utilize services like chaining and streaming for executing language processing tasks.
Embeddings and RAG Support
Users can leverage llama_ros to create robust embeddings and implement retrieval-augmented generation (RAG) tasks, integrating these functionalities into their smart retrieval systems.
The llama_ros project significantly enhances the capabilities of ROS 2 by integrating sophisticated language models and enhancing language and vision-related tasks within robotic applications.