AutoCoder

News 🔥

A significant update in the realm of code generation has been the release of a new model known as AutoCoder_QW_7B. This update addresses a prior issue where the model required user input to start code interpretation. The AutoCoder_QW_7B model operates more autonomously, improving user experience by automatically verifying its code acts.

The base for AutoCoder_QW_7B is the robust CodeQwen1.5-7b model.

Introduction 📢

AutoCoder is a powerful tool introduced for the specific task of code generation. Impressively, it has surpassed GPT-4 Turbo (April 2024) in test accuracy on the HumanEval benchmark, showcasing scores of 90.9% compared to GPT-4 Turbo’s 90.2%.

One of the distinct advantages AutoCoder provides over previous open-source models is its ability to automatically install necessary packages and iteratively run the code until all issues are resolved, initiating this process whenever a user wishes to execute code.

Code Interpreter Differences:

AutoCoder's code interpreter distinguishes itself from others such as GPT-4 Turbo, primarily because GPT-4o cannot access external libraries, whereas AutoCoder seamlessly handles package installations, thereby broadening its usability.

When compared to the OpenCodeInterpreter, AutoCoder calls upon the code interpreter only when verification of the code is needed, akin to GPT-4 Turbo, making it efficient in execution.

Model 🎁

For those interested in utilizing AutoCoder, it is available on Huggingface with different size variants:

Both models derive from the deepseeker-coder base. In particular, its smaller sibling, AutoCoder_QW_7B, is built on the CodeQwen1.5-7b base.

Quick Start 🚀

To get started with AutoCoder:

Create a Conda Environment:

conda create -n AutoCoder python=3.11
conda activate AutoCoder
pip install -r requirements.txt

Testing on Benchmarks:
- For HumanEval, which scores 90.9% on base calculations and 78.0% on base plus extra computations:
```
cd Evaluation
python test_humaneval.py
```
- For MBPP, achieving 82.5% on base and 70.6% on base + extra:
```
python test_humaneval.py
python postprocess_mbpp.py
```
- For DS-1000:
```
python test_ds1000.py
```
Web Demo:

A web demonstration, including a code interpreter, can be set up by installing Gradio and running:
```
pip install gradio==3.48.0
cd /Web_demo
python chatbot.py
```

Important Notes ⚠️

It is recommended to set the do_sample=True (default) when utilizing the code interpreter.
Deploying the model on a Linux system is advisable for optimal performance.

Contact 📧

For any questions, issues, or other communications, you are encouraged to reach out via email at [email protected].

Citation 📚

For those referencing the work, here is the citation:

@misc{lei2024autocoder,
      title={AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}}, 
      author={Bin Lei and Yuchen Li and Qiuwu Chen},
      year={2024},
      eprint={2405.14906},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}

Acknowledgments 🙏

Special thanks to Tianyu Zheng, the first author behind the OpenCodeInterpreter, for his technical guidance and insights.