Deep-RL-Keras - Exploring Modular Implementations of Deep Reinforcement Learning Algorithms in Keras

Deep Reinforcement Learning in Keras

Deep-RL-Keras is a project that provides a modular implementation of several popular deep reinforcement learning (DRL) algorithms using the Keras library. The project focuses on two main categories of algorithms: Actor-Critic Algorithms and Deep Q-Learning Algorithms. It aims to be a helpful resource for researchers and practitioners interested in deep reinforcement learning.

Overview of Algorithms

Actor-Critic Algorithms

N-step Advantage Actor Critic (A2C):
- A2C is a model-free, off-policy algorithm that involves two components: the actor, which predicts the best actions, and the critic, which evaluates these actions.
- It uses the N-step method to increase stability, integrating entropy regularization to encourage exploration during training.
- While A2C is efficient, it can become computationally expensive for complex environments like Atari games.
N-step Asynchronous Advantage Actor Critic (A3C):
- A3C builds upon A2C by introducing asynchronous weight updates. This means multiple agents work simultaneously, speeding up computations.
- A3C is noted for its performance in environments like the Atari Breakout game.
Deep Deterministic Policy Gradient (DDPG):
- DDPG is designed for continuous action spaces and uses a deterministic policy with a target network for better learning stability.
- It includes experience replay and parameter noise for exploration, tested on environments like the Lunar Lander.

Deep Q-Learning Algorithms

Double Deep Q-Network (DDQN):
- An enhancement of the standard DQN algorithm, DDQN uses two separate neural networks, one for predictions and another for more stable target values.
- It utilizes Experience Replay to randomly sample past experiences, aiding in reducing data correlation and improving learning stability.
Double Deep Q-Network with Prioritized Experience Replay (DDQN + PER):
- This variant improves upon DDQN by incorporating PER, which prioritizes learning from experiences with high error rates, allowing the model to focus on important samples.
Dueling Double Deep Q-Network (Dueling DDQN):
- Dueling DDQN enhances the Q-learning architecture by separating the estimation of state value and advantage, leading to improved learning efficiency and performance.

Getting Started

To get started with Deep-RL-Keras, users need to install Keras version 2.1.6 and OpenAI Gym. This project caters to different environments like CartPole, Atari games, and more, showcasing its adaptability across various scenarios.

$ pip install gym keras==2.1.6

Running the Algorithms

Each algorithm can be run with a command specifying the type, environment, and additional parameters:

$ python3 main.py --type A2C --env CartPole-v1
$ python3 main.py --type A3C --env BreakoutNoFrameskip-v4 --is_atari --nb_episodes 10000 --n_threads 16
$ python3 main.py --type DDPG --env LunarLanderContinuous-v2

Visualization & Monitoring

The project includes tools for visualizing and monitoring models using Tensorboard and Plotly. Users can observe real-time training progress and visualize results for analysis.

Model Visualization: Saved models can be visualized in their training environments.
Tensorboard: Provides a graphical interface for monitoring training scores.
Results Plotting: Allows plotting of training results using Plotly for detailed examination.

Acknowledgments

The project stands on several open-source works, including templates and environment wrappers from other contributors, showcasing a collaborative effort in the DRL community.

References

The project draws on various seminal papers, providing a strong theoretical backbone for the implemented algorithms. These papers are linked within the documentation for those interested in the underlying research.

Deep-RL-Keras makes advanced DRL techniques accessible to a broader audience by simplifying complex algorithms into usable code with detailed guidance and support.