Alpha Zero General: Simplified Self-Play Reinforcement Learning
Alpha Zero General is a streamlined and flexible implementation of self-play based reinforcement learning inspired by the AlphaGo Zero paper by Silver et al. This project is designed to be adaptable to any two-player, turn-based adversarial game and can be integrated with any deep learning framework of the user's choice. Sample implementations are available for the game of Othello, using both PyTorch and Keras frameworks. For those looking to learn more about how to use this project, a comprehensive tutorial is available here.
Core Components
The project is structured to allow users to integrate their own games by subclassing the provided classes in Game.py
and NeuralNet.py
. Othello serves as an example, with its implementation found in othello/OthelloGame.py
and the neural networks in othello/{pytorch,keras}/NNet.py
.
Training and Search
- Coach.py: This is where the core training loop is implemented.
- MCTS.py: Handles the Monte Carlo Tree Search, a fundamental component of the learning process.
- main.py: Users can specify parameters for self-play here, such as the game and framework.
For those interested in experimenting and modifying the neural networks, additional parameters like the CUDA flag, batch size, epochs, and learning rate can be adjusted in othello/{pytorch,keras}/NNet.py
.
To begin training a model for Othello, users can simply execute:
python main.py
They must ensure the game and deep learning framework options are correctly set in main.py
.
Docker Setup
For an efficient setup of the runtime environment, nvidia-docker is recommended. Once installed, users can run:
./setup_env.sh
This command sets up a Jupyter Docker container using PyTorch by default. Afterward, players can start the training with:
docker exec -ti pytorch_notebook python main.py
Experimental Results
As part of an experimental setup, a PyTorch model was trained for a 6x6 version of Othello, which involved around 80 iterations with 100 episodes per iteration and 25 MCTS simulations per turn. The process took approximately three days on an NVIDIA Tesla K80. The resulting pretrained PyTorch model is available in pretrained_models/othello/pytorch/
, allowing users to compete against it using the pit.py
script. Performance comparisons with a random and a greedy baseline are documented, illustrating the model's progression with more iterations.
Contribution Opportunities
While the project is functional, contributions are welcome in the following areas:
- Implementing game logic for additional games following the specifications in
Game.py
. - Developing neural networks compatible with other frameworks.
- Providing pretrained models for various game configurations.
- Creating an asynchronous version of the code to separate self-play, neural net training, and model comparison.
- Implementing asynchronous MCTS as suggested in the original paper.
Some extensions and further developments can be explored here.
Acknowledgements and Contributors
The project's core design and implementation were primarily supported by Shantanu Thakoor and Megha Jhunjhunwala. Several others have contributed across different areas:
- Shantanu Kumar implemented TensorFlow and Keras models for Othello.
- Evgeny Tyurin provided rules for TicTacToe.
- MBoss handled rules and a model for GoBang.
- More contributions span various games and implementations.
Earlier versions supported additional frameworks like Chainer and TensorFlow v1, accessible before a specific commit in the archive.
Overall, Alpha Zero General stands as a versatile and comprehensive project for implementing self-play reinforcement learning across various games and frameworks. By providing a flexible architecture and detailed examples, it serves as a valuable tool for both developers and researchers interested in deep reinforcement learning.