AlphaZero_Gomoku - Implement Self-Play AI with AlphaZero for Improved Gomoku Strategy

Introduction to AlphaZero_Gomoku

AlphaZero_Gomoku is a fascinating project that implements the AlphaZero algorithm for the classic board game Gomoku, also known as Gobang or Five in a Row. This project aims to develop an AI model capable of mastering Gomoku through self-play, utilizing a general reinforcement learning algorithm. Gomoku provides a more straightforward context than complex board games like Go or chess, allowing enthusiasts to understand the training scheme of AlphaZero more easily and achieve a competent AI on a personal computer in just a few hours.

Background and References

The project draws inspiration from significant works in AI and machine learning, specifically:

AlphaZero: Achieving proficiency in Chess and Shogi through self-play using a general reinforcement learning algorithm.
AlphaGo Zero: Conquering the game of Go without any human input.

Updates and Compatibility

AlphaZero_Gomoku is compatible with various popular machine learning frameworks. As of February 24, 2018, TensorFlow support was introduced, and on January 17, 2018, the project began supporting PyTorch. These updates allow for broader access and flexibility in training the AI model using different frameworks.

Example Games

The project showcases example games between trained models, utilizing 400 Monte Carlo Tree Search (MCTS) playouts per move. Interested individuals can view these gameplays to understand how the trained AI model strategizes its moves.

Requirements

To engage with the trained AI models, the only necessary requirements are:

Python version 2.7 or higher
Numpy version 1.11 or higher

For those looking to train the AI model from the ground up, additional requirements include either:

Theano version 0.7 or higher and Lasagne 0.1 or higher, or
PyTorch version 0.2.0 or higher, or
TensorFlow

It is important to address compatibility issues if the Theano version is greater than 0.7 by following specific installation instructions to ensure Lasagne functions properly.

Getting Started

Playing with the provided models involves running a straightforward script:

python human_play.py

This file can be modified to test different models or to implement pure MCTS strategies. Meanwhile, training the AI model from scratch requires alterations in the train.py file according to the chosen deep learning framework (Theano, PyTorch, or TensorFlow), followed by executing the script:

python train.py

For those using GPUs and PyTorch beyond version 0.5, slight modifications to the script are necessary for optimized performance.

Training results are periodically saved, with the models—best_policy.model and current_policy.model—being recorded every default 50 updates.

Tips for Successful Training

Starting with a smaller 6x6 board and aiming for 4 in a row can yield a functional model with approximately 500 to 1000 self-play games, typically completed in about 2 hours.
For an 8x8 board with a target of 5 in a row, developing a good model might require 2000 to 3000 self-play games, likely taking up to 2 days on a single computer.

Further Learning

For those interested in a more in-depth understanding in Chinese, an article detailing the implementation is available here.

AlphaZero_Gomoku presents an accessible pathway to experiencing and understanding the sophisticated workings of AI within the realm of board games, aided by well-documented instructions catering to varying levels of technical proficiency.