stable-baselines3-contrib - Experimental Reinforcement Learning Algorithms and Utilities for Researchers

Stable-Baselines3 - Contrib (SB3-Contrib)

The Stable-Baselines3 Contrib, often referred to as SB3-Contrib, is a specialized package designed to extend the capabilities of the original Stable-Baselines3 by providing experimental reinforcement learning (RL) code. This initiative serves as a testing ground for new RL algorithms and tools, particularly those that originate from recent academic publications. The main aim is to preserve the simplicity, documentation quality, and coding style of stable-baselines3 while delving into less mature and experimental implementations.

The Purpose of SB3-Contrib

Stable-Baselines3 has witnessed active contributions from the RL community, ranging from improved logging utilities and environment wrappers to support for different action spaces and new learning algorithms. However, some contributions were too niche or difficult to integrate seamlessly into the main stable-baselines repository. SB3-Contrib addresses these challenges by offering a flexible platform where code integration does not need to be as polished, and niche utilities can be accommodated. This approach helps provide experimental yet reliable implementations, adhering to consistent documentation and style standards, without being constrained by the boundaries of the main stable-baselines repository.

Features of SB3-Contrib

The SB3-Contrib package includes a variety of algorithms and tools, designed to push the boundaries of what is possible within reinforcement learning. Here is an overview:

RL Algorithms:

Augmented Random Search (ARS): A method enabling efficient exploration and exploitation.
Quantile Regression DQN (QR-DQN): An extension of DQN that incorporates quantile regression.
PPO with Invalid Action Masking (MaskablePPO): A version of PPO incorporating action masking for invalid moves.
PPO with Recurrent Policy (RecurrentPPO or PPO LSTM): Combines PPO with recurrent networks for sequential data.
Truncated Quantile Critics (TQC): Offers enhanced exploration by truncating quantiles in action-value estimations.
Trust Region Policy Optimization (TRPO): Focuses on maintaining trust regions for stable policy updates.
Batch Normalization in Deep Reinforcement Learning (CrossQ): Introduces normalization techniques to deep RL scenarios.

Gym Wrappers:

Time Feature Wrapper: Adds a crucial time awareness feature to RL environments.

Accessing Documentation

Complete documentation for SB3-Contrib is available online to provide guidance on its usage and features: SB3-Contrib Documentation

Installation Guide

To install SB3-Contrib using pip, use the following command:

pip install sb3-contrib

For optimal use, it is recommended to install the master version of Stable Baselines3:

pip install git+https://github.com/DLR-RM/stable-baselines3

To get the master version of SB3-Contrib:

pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

How to Contribute

For those interested in contributing to the project, it is advised to start by reading the CONTRIBUTING.md guide which outlines the necessary steps and guidelines.

Citing SB3-Contrib

If you wish to cite the project in academic publications, here is a suggested citation format:

@article{stable-baselines3,
  author  = {Antonin Raffin and Ashley Hill and Adam Gleave and Anssi Kanervisto and Maximilian Ernestus and Noah Dormann},
  title   = {Stable-Baselines3: Reliable Reinforcement Learning Implementations},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {268},
  pages   = {1-8},
  url     = {http://jmlr.org/papers/v22/20-1364.html}
}

The SB3-Contrib project stands as a testament to the thriving and collaborative nature of the RL community, offering a platform for innovative experimentation and growth in the field.