LLM Verified with Monte Carlo Tree Search
This project introduces a novel approach to generating verified code using a combination of a Language Model (LLM) and Monte Carlo Tree Search (MCTS). The primary goal is to explore different possibilities in the code generation space to ensure the code is correct at each step by verifying it.
How It Works
The project employs MCTS, a decision process used primarily in AI to explore the options available in various scenarios systematically. By using MCTS, the model evaluates the possible paths for generating a program and actively verifies each part of the code to ensure accuracy, thus effectively mimicking the way a programmer might think.
To verify the correctness of the code, the project utilizes different tools like Dafny, Coq, Lean, Scala, or Rust, which are formal verification systems capable of checking the logic and correctness of programs.
Running the Project
The project requires GPU support and has been tested on systems with NVIDIA A100 GPUs to facilitate the LLM operations effectively. Users can clone the repository, which includes submodules, and setup an environment to start testing and experimenting with the code.
Basic Setup
To set up the environment, one needs to clone the repository and install the necessary dependencies. This involves creating a Python environment, installing required packages, and logging into Hugging Face for additional resources.
For different verification tools, like Dafny and Coq, specific installation steps are described. Successfully setting up these tools is critical for the model to run and verify the code appropriately.
Executing and Experimenting
Multiple execution options and configurations allow users to run the model in different modes:
- Baseline Execution: Run the default setup to generate solutions using the LLM.
- Interactive Mode: Involve user interactivity during execution for potentially better results.
- Verifier Feedback: Enable feedback from the verifier to guide the search process.
- Training Modes: Various training processes like PPO and DPO are available to refine the model's performance.
In addition, other execution scripts are set for diverse purposes, such as promoting diversity in output, focusing on specific verification methods like Coq, and running on predefined datasets like the Clover benchmark.
Adding Unit Tests
The project supports the inclusion of unit tests by allowing users to add test cases to verify the generated code. These tests ensure that only code meeting specified conditions is considered correct, enhancing the reliability of the generated output.
Key Takeaways
By combining feedback from verification processes with a scalable LLM and tree search methods, this project allows for efficient and verified code synthesis. This technique holds the potential to compete with more robust models by leveraging verification as a core aspect of the code generation process. The approach gives weaker models the ability to generate reliable code effectively, irrespective of their initial understanding of the programming language involved.
Citation
For those interested in delving deeper or referencing this project, a detailed paper is available that outlines the methodology and its implications: "VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search".
This comprehensive approach showcases how modern AI techniques can intersect with traditional verification methods to revolutionize the way we think about program synthesis and verification.