cartpole - Optimize Cartpole Dynamics Using Deep Q-Learning (DQN) Method

Introduction to the Cartpole Project

The Cartpole project focuses on solving a popular problem in the field of Reinforcement Learning using a method known as Deep Q-Learning (DQN). The challenge originates from OpenAI's Cartpole environment, a common benchmark for testing reinforcement learning algorithms.

About Cartpole

At its core, the Cartpole problem is about controlling a pole balanced on a cart that moves along a frictionless track. The setup involves a pole connected to the cart by a joint that doesn't produce any force itself. The main task is to apply a force of either +1 or -1 to the cart in order to keep the pole upright. Each time unit the pole remains balanced, a reward of +1 is awarded. The episode concludes if the pole tips over beyond 15 degrees from vertical, or if the cart shifts more than 2.4 units away from the center of the track.

Deep Q-Learning (DQN)

For this project, a standard DQN algorithm with Experience Replay was utilized. Let's break down some of the key elements involved:

Hyperparameters:

GAMMA: 0.95 - This discount factor determines the importance of future rewards.
LEARNING_RATE: 0.001 - Controls how much the model is updated during learning.
MEMORY_SIZE: 1,000,000 - The size of the replay memory.
BATCH_SIZE: 20 - The number of samples used from memory to update the model.
EXPLORATION_MAX: 1.0 - The initial probability of exploring random actions.
EXPLORATION_MIN: 0.01 - The minimum probability of random exploration.
EXPLORATION_DECAY: 0.995 - How quickly the exploration rate decreases over time.

Model Structure:

The neural network model for DQN in this project consists of three layers:

An input layer with 4 neurons, a hidden layer featuring 24 neurons with ReLU activation.
Another hidden layer with 24 neurons, also using ReLU activation.
An output layer with 2 neurons for action predictions, utilizing a linear activation function.

The model is optimized using the Adam optimizer and employs the Mean Squared Error (MSE) as its loss function.

Performance of the Cartpole Model

Success in the Cartpole environment is defined as achieving an average reward of 195.0 over 100 consecutive episodes. Visualizations, such as example trial gifs and performance charts, illustrate the training process and demonstrate the system's ability to learn how to keep the pole upright effectively.

Author Information

This project was developed by Greg (Grzegorz) Surma, a contributor with multiple projects in the domain of machine learning and AI. For further exploration into his work, you can visit his portfolio, GitHub, and Medium blog.