[PYTORCH] Proximal Policy Optimization (PPO) for Contra NES
Introduction
The Contra-PPO-PyTorch project is an implementation of the Proximal Policy Optimization (PPO) algorithm, designed for training an AI agent to play the classic NES game, Contra. This project builds upon the renowned PPO algorithm developed by OpenAI, which has proven to be a formidable tool in AI gaming applications, evidenced by its role in training OpenAI Five to defeat top human teams in esports events like Dota 2.
Within this project, AI enthusiasts and developers can explore the application of PPO within the nostalgic gaming context of Contra, offering a fascinating demonstration of AI capabilities in mastering complex video games.
Motivation
The project's originator previously shared implementations of another AI algorithm, A3C, for the game Super Mario Bros. After demonstrating superior performance with PPO in the same environment, the decision was made to extend this approach to Contra, another iconic NES game. The goal was to ascertain whether PPO could replicate, or even exceed, its success in a new gaming setting, thereby broadening its applicability and relevance.
How to Use the Code
The project offers practical capabilities for AI training and testing in Contra:
- Training: Users can train a model by executing the command
python train.py
. For instance,python train.py --level 1 --lr 1e-4
initializes training on level 1 with a specified learning rate. - Testing: For assessing a trained model, the command
python test.py
is available, such aspython test.py --level 1
for evaluating performance on level 1.
Docker Setup
To streamline the setup process, the project provides a Dockerfile, facilitating both training and testing within a Docker container.
Build the Docker Image: To build the Docker image (assuming it's named ppo), use the following command in your terminal:
sudo docker build --network=host -t ppo .
Run the Docker Container: Execute the trained models or train new ones using this runtime command:
docker run --runtime=nvidia -it --rm --volume="$PWD"/../Contra-PPO-pytorch:/Contra-PPO-pytorch --gpus device=0 ppo
Within the Docker container, you can run the train.py
or test.py
scripts as described earlier.
Note on Docker Visualization: A noted issue when running in Docker is that rendering may not function as expected. To mitigate this, comment out the env.render()
line in the src/process.py
script when training or in test.py
when testing. While this means the visualization window will not appear during execution, the processes will complete as intended, with testing producing an output video file for review.
This project showcases the innovative use of PPO to tackle the challenges of classic video gaming, inviting developers to further explore and expand AI capabilities in entertainment.