AutoNode: Revolutionizing GUI Automation with Cognitive Intelligence
Introduction to AutoNode
AutoNode is an innovative self-operating computer system aimed at simplifying and automating interactions on websites. By harnessing cutting-edge technologies, such as Optical Character Recognition (OCR) and YOLO (You Only Look Once) models for object detection, AutoNode navigates and interacts with web pages effortlessly. This system is designed to streamline tasks that involve web interactions and data extraction, minimizing the need for manual operations.
Setting Up AutoNode
To get started with AutoNode, ensure you have Python and Docker installed on your system. Follow these steps to install and launch AutoNode:
-
Clone the Repository: Use the terminal to clone the AutoNode repository from GitHub.
git clone https://github.com/TransformerOptimus/AutoNode.git
-
Navigate to the Directory: Change your directory to the AutoNode folder.
cd AutoNode
-
Environment Configuration: Create a copy of the provided
.env.example
file and rename it to.env
for each module (autonode, yolo, ocr). -
Docker Setup: With Docker installed and running, execute the following command to build and start the application.
docker compose -f docker-compose.yaml up --build
-
Verify Installation: Open a browser and go to
http://localhost:8001/health
to confirm that the server is operational.
How AutoNode Works
AutoNode operates based on a site-graph, a structured layout that details the actions and navigation paths on a website. Here's a simplified process to use AutoNode:
-
Define Your Objective: Clearly specify your task, such as data extraction or performing automated interactions on a website.
-
Create a Site-Graph: Draft a JSON file that maps out the site's structure. This graph includes nodes (web elements) and edges (actions to be undertaken).
-
Set Up an Initiator Planner Prompt: Use OpenAI's LLM to generate a custom prompt file that guides AutoNode's tasks.
-
Execute AutoNode: Utilize the AutoNode API to initiate and control web automation tasks.
Using AutoNode's API
The AutoNode API offers a straightforward interface to programmatically automate web tasks. To send requests, use the following endpoint structure:
- API Documentation: Accessible via
(http://localhost:8001/docs)
, providing comprehensive detail on all endpoints. - Request to Initiate Task: Send a JSON payload to the
/api/autonode/initiate
endpoint, including information on site URL, objectives, and site-graph paths.
Integrating YOLO and OCR Models
AutoNode uses YOLO models for detecting objects like clickable buttons and OCR to read textual content from images. This duality allows for dynamic interaction with web interfaces. If required, you can also train and integrate your custom YOLO model.
Hosting and Storing Data
- Remote Hosting: If local resources are inadequate, the OCR and YOLO modules can be hosted remotely on cloud services.
- Storing Outputs: Screenshots and other outputs can be stored on AWS S3 or locally, based on your preferences.
Preparing a Site-Graph
Constructing a concise site-graph involves identifying key web elements and outlining navigation flows through nodes and edges. Here's a basic example:
{
"1": {
"node_type": "clickable_and_typeable",
"node_name": "Login Button",
"location": [100, 200],
"adjacent_to": ["2"]
},
"2": {
"node_type": "clickable_and_typeable",
"node_name": "Username Field",
"location": [150, 250],
"type_description": "Enter username here"
}
}
Conclusion
AutoNode stands out as a powerful tool for cognitive GUI automation, enabling efficient web interactions and data extraction with minimal manual intervention. Its seamless setup, combined with robust functionalities like object detection and text recognition, positions it as an invaluable resource for developers and businesses focused on automating web-based tasks.