agentlego - Discover Versatile Tool APIs for Enhancing Large Language Model Agents

Introduction

AgentLego is a dynamic and open-source library designed to enhance the capabilities of agents based on large language models (LLMs). It introduces a wide array of tool APIs that extend these agents' functionalities beyond typical text processing. This library is exceptionally suited for tasks that incorporate multimodal elements such as images, audio, and visual-language reasoning. It's ideal for developers looking to leverage additional tools in creating more robust and versatile agents.

Key Features

AgentLego offers several standout features:

Rich Multimodal Toolset: The library contains tools for visual perception, image creation and modification, speech processing, and much more. This allows agents to handle diverse types of data effectively.
Flexible Tool Interface: Users can add their custom tools with ease, supporting any type of input and output. This design empowers developers to tailor AgentLego to their specific needs.
Seamless Integration: The library integrates smoothly with other LLM-based frameworks like LangChain, Transformers Agents, and Lagent, making it a versatile choice for various agent applications.
Remote Tool Access: Particularly beneficial for resource-intensive models needing specific environments, AgentLego supports remote tool usage, helping to optimize resources efficiently.

Quick Starts

Installation

To start using AgentLego, you can easily install it via pip:

pip install agentlego

For some tools, additional dependencies may be required. Always check the specific tool's README for setup instructions. For instance, using the ImageDescription tool might need the installation of extra packages like OpenMIM.

Using Tools Directly

AgentLego allows users to directly utilize its tools within their Python environment. Here's a simple example that demonstrates how to use the ImageDescription tool:

from agentlego import list_tools, load_tool

print(list_tools())  # See available tools in AgentLego

image_caption_tool = load_tool('ImageDescription', device='cuda')
print(image_caption_tool.description)
image = './examples/demo.png'
caption = image_caption_tool(image)

Integration into Agent Frameworks

AgentLego can be easily integrated into various agent frameworks. Examples include configurations for:

Lagent
Transformers Agents
VisualChatGPT

Supported Tools

AgentLego comes packed with a wide variety of tools across different categories:

General Abilities: Such as calculators and search engines.
Speech Processing: Text-to-speech and speech-to-text converters.
Image Processing: Tools for image description, OCR, object detection, and more.
AIGC Related Tasks: Includes text-to-image generation, image expansion, and stylization.

Each category contains specialized tools to cater to the specific processing needs and broadens the potential application domains significantly.

Licence

AgentLego is available under the Apache 2.0 license, and while using it, users must adhere to the licenses pertaining to the models involved in their projects.

Overall, AgentLego is a powerful toolkit for anyone interested in developing multifaceted, intelligent LLM-based agents with enhanced abilities in multimodal processing. It simplifies integration and provides an extensive toolkit for broadening an agent's scope of understanding and interaction.