devlooper - Automated Program Synthesis Utilizing Iterative Testing and Debugging

Introduction to Devlooper

devlooper is an innovative program synthesis agent that has the unique ability to autonomously fix its output by executing tests. This means it can improve and refine code without human intervention, ensuring more reliable software development. For instance, devlooper can efficiently generate a Python library for creating voronoi diagrams, achieving success in just 11 iterations.

How Devlooper Works

devlooper builds upon the capabilities of the smol developer by allowing it to use a sandbox environment for testing purposes. The agent continues to iterate on the code, refining it and resolving any issues by updating the codebase and managing the environment, such as installing necessary packages, until all tests are successfully passed.

Environment Templates

To facilitate this, devlooper employs environment templates that define the basic components and testing framework for various languages or frameworks. Currently, it supports React with Jest, Python, and Rust. However, it is designed to be versatile, enabling any language or framework compatible with container installation. Contributions for more template options are encouraged, as indicated in the env_templates.py file.

Sandbox Testing

Using Modal's Sandbox feature, devlooper can execute tests in an isolated setting and fetch the output efficiently. This sandbox approach allows for incremental image construction, akin to building a Dockerfile with cached layers, enhancing the development process.

Debug Loop

During its iterative process, the agent executes test commands within the designated environment. If issues arise, indicated by a non-zero exit code, the agent utilizes the output from the sandbox to diagnose the errors with the help of a Language Learning Model (LLM). This diagnosis results in a DebugPlan, which includes actions like:

Inspecting and fixing files.
Installing necessary packages.
Executing commands within the systems image.

The process of running the diagnostic as a separate stage has been shown to significantly improve the agent's accuracy.

Usage

Setting Up

To get started with devlooper, follow these simple steps:

Create an account on Modal and install the modal package in your Python environment:
```
pip install modal
```
Generate a Modal token:
```
modal token new
```
Create an OpenAI account and fetch an API key. Then, create a Modal secret titled openai-secret.

Generating Code

With the setup complete, you can start generating code. From the root directory of the repository, execute:

modal run src.main --prompt="a simple 2D graphics library" --template="rust"

Other examples include:

modal run src.main --prompt="a todo-list app" --template="react"

modal run src.main --prompt="a webscraper that checks if there are new reservations for a given restaurant on Resy" --template="python"

Upon successful execution, the results are stored in the output/ directory by default, though this can be changed with the --output-path option.

Showcase

Stay tuned for upcoming showcases highlighting the capabilities and outputs of devlooper.

Future Directions

devlooper is still in the developmental stage, serving as a proof of concept, but there are several exciting enhancements in the pipeline:

Incorporating user feedback or accepting existing projects for modifications.
Improving debugging prompts using relevant code snippets.
Adding functions to fetch necessary package documentation.
Utilizing previous edits to avoid unnecessary repetitive loops.
Automatically synthesizing new EnvTemplates.
Extending compatibility to additional LLMs, including open-source alternatives.

These future developments aim to make devlooper even more robust and versatile for developers around the world.