FireAct - Innovative Methods for Language Agent Fine-tuning

FireAct: Toward Language Agent Fine-tuning

FireAct is an innovative project focused on the fine-tuning of language agents. Stemming from a comprehensive academic publication, the project's repository encompasses prompts, demo code, and fine-tuning data integral for language model development and experimentation. The project is spearheaded by Baian Chen and collaborators, highlighting its scholarly foundation.

Overview

The FireAct project is structured to facilitate the development and fine-tuning of language agents. Key components include:

Tools: Defined in the tools/ directory, these are necessary for implementing various functionalities.
Tasks: Outlined in tasks/, these are targeted objectives or questions to be addressed by the language model.
Data Collection & Experimentation: Managed through generation.py, this step involves gathering data and running experiments. The outcomes are stored in trajs/.

Data & Prompts

Data management and prompt setup are critical to the FireAct project. The data/ directory contains datasets to generate training data, with examples formatted for both Alpaca and GPT models. The prompts/ directory is specifically designed to aid in the generation of effective training data and in conducting experimental runs.

Setup

To get started with FireAct, users need to secure API keys for OpenAI and SERP services, creating environment variables for both. Setting up the project involves:

Creating a virtual environment using Conda.
Cloning the repository from GitHub.
Installing necessary dependencies listed in requirements.txt.

Running the Demo

Data Generation
Users can execute data generation with:

python generation.py --task hotpotqa --backend gpt-4 --promptpath default --evaluate --random --task_split val --temperature 0 --task_end_index 5

This command requires setting a high --task_end_index for obtaining substantial data samples. The data must then be converted to suitable formats for training.

Supervised Fine-tuning
The fine-tuning process for language models is illustrated with:

cd finetune/llama_lora
python finetune.py --base_model meta-llama/Llama-2-13b-chat-hf --data_path ../../data/finetune/alpaca_format/hotpotqa.json --micro_batch_size 8 --num_epochs 30 --output_dir ../../models/lora/fireact-llama-2-13b --val_set_size 0.01 --cutoff_len 512

Inference
Inference tasks can be executed using the following examples:

For FireAct Llama:

python generation.py --task hotpotqa --backend llama --evaluate --random --task_split dev --task_end_index 5 --modelpath meta-llama/Llama-2-7b-chat --add_lora --alpaca_format --peftpath forestai/fireact_llama_2_7b_lora

For FireAct GPT:

python generation.py --task hotpotqa --backend ft:gpt-3.5-turbo-0613:<YOUR_MODEL> --evaluate --random --task_split dev --temperature 0 --chatgpt_format --task_end_index 5

Model Zoo

FireAct offers a variety of multitask models based on the Llama family, available on Hugging Face:

Llama2-7B: Available as a LoRA fine-tuned model and full model.
Llama2-13B: Also available as a LoRA fine-tuned model.
CodeLlama series (7B, 13B, 34B): All fine-tuned using the LoRA method.

These models encompass diverse functionalities suitable for a range of language tasks.

References

FireAct is complemented by key contributions from notable codebases such as ReAct, Stanford Alpaca, and various LoRA implementations, ensuring robust and comprehensive support for model fine-tuning.