[PYTORCH] Proximal Policy Optimization (PPO) for Playing Super Mario Bros
Introduction
This project involves a Python implementation designed to train an agent to play the classic video game Super Mario Bros. Utilizing the Proximal Policy Optimization (PPO) algorithm, as detailed in the research paper "Proximal Policy Optimization Algorithms" by OpenAI, this project aims to achieve impressive results in gaming AI.
In terms of performance, the PPO-trained agent has managed to complete 31 out of the 32 levels in the game, far exceeding initial expectations. For context, PPO is a reinforcement learning algorithm developed by OpenAI, renowned for its application in training AI, such as the OpenAI Five, which successfully competed against top-tier human players in the esports game Dota 2.
Motivation
The journey leading to this project began with the release of an earlier implementation using the Asynchronous Advantage Actor-Critic (A3C) algorithm to train an agent for Mario. However, while the A3C-trained agent performed well, completing levels quickly, it was limited to conquering only 19 out of 32 levels. This prompted the search for a more effective method.
Before settling on PPO, various other algorithms, like A2C and Rainbow, were partially implemented. A2C did not significantly enhance performance, while Rainbow seemed more suited for unstructured environments, such as arcade-style games. Eventually, PPO was chosen for its potential to overcome these limitations.
How to Use the Code
The provided Python code offers a straightforward approach to training and testing custom models:
- Train a Model: Execute
python train.py
with specified world and stage, e.g.,python train.py --world 5 --stage 2 --lr 1e-4
. - Test a Trained Model: Execute
python test.py
, e.g.,python test.py --world 5 --stage 2
.
Note: If a training level proves difficult, adjusting the learning rate may facilitate success. For instance, levels can be conquered by experimenting with rates like 1e-3, 1e-4, or 1e-5. Particularly challenging levels, such as 1-3, might require finely tuned rates, such as 7e-5, after numerous attempts.
Docker Compatibility
To streamline the process, a Dockerfile is included for running both training and testing phases. Assuming the repository is cloned and accessed:
-
Build the Docker Image:
sudo docker build --network=host -t ppo .
-
Execute the Docker Container:
docker run --runtime=nvidia -it --rm --volume="$PWD"/..:/Super-mario-bros-PPO-pytorch --gpus device=0 ppo
Inside the Docker container, run the train.py
or test.py
scripts as outlined previously.
Note: Due to a rendering bug with Docker, comment out the env.render()
line in the src/process.py
or test.py
scripts to prevent visualization issues during training and testing. Although no window will open for gameplay viewing, the training process will still execute correctly, and the testing phase will generate a visualization mp4 file.
Challenges at Level 8-4
Despite the significant success across different levels, level 8-4 remains unconquered. Levels like 4-4, 7-4, and 8-4 include puzzles requiring the player to select the right path, which complicates progress. While solutions for 4-4 and 7-4 have been found under certain conditions, 8-4 still poses a challenge.