llama_ros - Integrate LLMs and VLMs into ROS 2 with llama_ros Packages

Introducing the llama_ros Project

The llama_ros project offers a comprehensive set of ROS 2 packages designed to seamlessly integrate llama.cpp into the ROS 2 ecosystem. This integration empowers developers to harness the robust optimization features of llama.cpp in their ROS 2 projects. By utilizing llama_ros, users can effectively run GGUF-based language models (LLMs) and vision-language models (VLMs). Moreover, users have access to functionalities like GBNF grammars and the capability to adjust LoRAs (Low-Rank Approximations) in real-time.

Related Projects

chatbot_ros: This project integrates a chatbot within the ROS 2 framework, using whisper_ros to interpret spoken language and llama_ros for generating responses. The chatbot's actions are dictated by a state machine built using YASMIN.
explainable_ros: It provides a tool in ROS 2 for explaining robot behavior. Using LangChain integration, logs are stored in a vector database, enabling users to retrieve relevant logs for queries answered with llama_ros.

Installation Guidelines

To execute llama_ros with CUDA support, one must initially install the CUDA Toolkit. Following installation, llama_ros can be compiled with CUDA support, enhancing its functionality for compatible systems.

$ cd ~/ros2_ws/src
$ git clone https://github.com/mgonzs13/llama_ros.git
$ pip3 install -r llama_ros/requirements.txt
$ cd ~/ros2_ws
$ rosdep install --from-paths src --ignore-src -r -y
$ colcon build --cmake-args -DGGML_CUDA=ON

Docker Support

llama_ros provides a Docker setup for users who prefer containerized environments. Users can build the llama_ros Docker image with options for CUDA support, provided they have the necessary NVIDIA toolkit installed.

$ DOCKER_BUILDKIT=0 docker build -t llama_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .
$ docker run -it --rm --gpus all llama_ros

Usage Overview

llama_cli

A set of commands is available within llama_ros to enhance the testing of GGUF-based models within the ROS 2 environment. For example, using the launch command, users can deploy a language model from a configuration file.

$ ros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/StableLM-Zephyr.yaml

Launch Files

The launch files are crucial for specifying the parameters and models needed for running llama_ros or llava_ros. These files can be crafted in Python or YAML format.

Example (Python Launch):

from launch import LaunchDescription
from llama_bringup.utils import create_llama_launch


def generate_launch_description():
    return LaunchDescription([
        create_llama_launch(
            model_repo = "TheBloke/Marcoroni-7B-v3-GGUF",
            model_filename = "marcoroni-7b-v3.Q4_K_M.gguf",
            system_prompt_type = "Alpaca"
        )
    ])

Advanced Features

LoRA Adapters

llama_ros enables the inclusion of LoRA adapters, allowing users to modify models in real-time and scale adaptations as needed for specific tasks.

ROS 2 Clients

The project provides interfaces for using its services within ROS 2 nodes. Clients can tokenize text, generate embeddings, or formulate responses, extending the model's utility in robotics applications.

LangChain Integration

With llama_ros, users can integrate LangChain to employ advanced prompt engineering and utilize services like chaining and streaming for executing language processing tasks.

Embeddings and RAG Support

Users can leverage llama_ros to create robust embeddings and implement retrieval-augmented generation (RAG) tasks, integrating these functionalities into their smart retrieval systems.

The llama_ros project significantly enhances the capabilities of ROS 2 by integrating sophisticated language models and enhancing language and vision-related tasks within robotic applications.