cleanrl - Streamlined Reinforcement Learning in Minimalist Code

Introduction to CleanRL

CleanRL stands for "Clean Implementation of RL Algorithms." It is a deep reinforcement learning (DRL) library designed to be accessible, clean, and easy to use, making it an attractive tool for researchers and developers interested in machine learning and artificial intelligence. CleanRL provides high-quality, singular file implementations of various reinforcement learning algorithms, emphasizing simplicity and thoroughness.

Key Features

Single-File Implementations
One of CleanRL's standout features is its single-file implementation model. Each algorithm variant is encapsulated within its own singular, standalone file. For instance, the ppo_atari.py file has only 340 lines of code, yet it covers all necessary implementation details of the Proximal Policy Optimization (PPO) algorithm applied to Atari games. This minimalist codebase offers a great reference for those who prefer not to navigate through an extensive, modular library.

Benchmarked Implementations
With over 7 algorithms and 34 games tested and benchmarked, CleanRL ensures its implementations are reliable and effective, providing plenty of reference material for users at CleanRL Benchmark.

Research-Friendly Features
CleanRL includes tools like Tensorboard for detailed logging, seeding for reproducibility, and video capturing for analyzing gameplay. It also facilitates experiment management through integration with Weights and Biases.

Cloud Integration
The library is designed to scale its experiments easily using cloud platforms. With Docker and AWS integration, users can control the execution of countless experiments without local resource limitations.

Getting Started

Requirements

Python (version between 3.7.1 and 3.11)
Poetry (version 1.2.1 or later)

To try out CleanRL locally, users can clone the repository, install dependencies using Poetry, and run experiments using simple terminal commands. Options are available for experienced users not employing Poetry, allowing flexibility in dependency management through requirements.txt.

Running Experiments
CleanRL supports various environments, from classic control games to advanced scenarios like Atari through integration with procgen. Users only need to set up their environments and choose their algorithms to begin experimenting.

Supported Algorithms

CleanRL implements several well-known reinforcement learning algorithms, each available through concise, singular files:

Proximal Policy Gradient (PPO)
Deep Q-Learning (DQN)
Categorical DQN (C51)
Soft Actor-Critic (SAC)
Deep Deterministic Policy Gradient (DDPG)
Twin Delayed DDPG (TD3)
Phasic Policy Gradient (PPG)
Random Network Distillation (RND)
Qdagger

Each algorithm comes with various implementations, including versions tailored for different environments like Atari or using additions like Long Short-Term Memory (LSTM).

Community and Support

The CleanRL community is vibrant and supportive, offering help primarily through a Discord channel. Users looking for assistance or wishing to contribute can reach out via GitHub issues or participate in the community discussions. To aid users, the team provides numerous resources, including a YouTube channel with past video recordings.

Citing CleanRL

For those using CleanRL in academic work, the library's creators have provided a citation format, showcasing CleanRL as a high-quality, research-ready tool in published works. CleanRL's technical paper is available for reference in the Journal of Machine Learning Research.

Acknowledgements

CleanRL is a community-driven project, benefiting from contributions of hardware for running experiments and resources provided by innovative cloud and hardware companies like Google's TPU Research Cloud and Hugging Face.

In summary, CleanRL is a powerful yet straightforward tool for deep reinforcement learning, offering clean implementations, ease of use, and a supportive community, making it an excellent choice for both beginners and seasoned researchers in the field.