Cradle: Enhancing Human-like Computer Control Through AI
Cradle is an innovative framework that bridges the gap between foundation models and complex computer tasks. By mimicking human interaction with computers—using screenshots as input and performing tasks through keyboard and mouse—Cradle empowers agents to navigate and interact with various digital environments efficiently.
Project Overview
Cradle fundamentally operates by equipping fledgling foundation models with the ability to execute tasks on a computer. These tasks are accomplished by processing visual data, such as screenshots, and making decisions to control inputs, just as a human would using a mouse and keyboard.
Recent Updates
In June 2024, Cradle underwent significant enhancements, expanding its capabilities to function across multiple environments, including:
- Popular video games such as Red Dead Redemption 2, Stardew Valley, Cities: Skylines, and Dealer's Life 2.
- A range of software applications including Chrome, Outlook, Capcut, Meitu, and Feishu.
These advancements allow Cradle to tackle an expanded variety of tasks, demonstrating the versatility and applicability of the framework in different contexts.
Installation and Setup
To start using Cradle, users must set up their environment carefully:
-
Prepare the Environment File: Users need to create a
.env
file storing necessary API keys for OpenAI and Claude. This step is crucial for enabling the interactivity features of Cradle. -
Python Environment Setup: The next step is to clone the Cradle repository, create an appropriate Python environment, and install mandatory dependencies:
git clone https://github.com/BAAI-Agents/Cradle.git cd Cradle conda create --name cradle-dev python=3.10 conda activate cradle-dev pip install -r requirements.txt
-
Install OCR Tools: Opt to install language models that suit your needs to enhance the interaction quality through proper language understanding.
By following these steps, users prepare their systems to run Cradle and leverage its capabilities.
Getting Started
The Cradle framework provides specific configuration settings tailored for each compatible game and software. Instructions and guides on setting Cradle in each environment are readily available, ensuring seamless integration and operation.
File Structure and Framework Architecture
Cradle's design includes a robust directory layout that supports easy navigation and modular customization. It allows users to add new games or applications by following the existing framework setup while tweaking configurations specific to their requirements. The comprehensive structure ensures flexibility and scalability as new features or environments are integrated.
Migrating to New Games
One of Cradle's robust features is its modularity, which facilitates adaptation to new games. This capability allows developers to extend Cradle's application by incorporating modules and configurations suited to novel environments, ensuring the framework remains relevant and versatile.
In conclusion, Cradle represents a significant leap forward in making AI-driven agents adept at handling diverse computer tasks through a human-like interface. Its adaptability, ease of setup, and expansive use cases make it a powerful tool in the realm of AI and human-computer interaction.