Project Introduction: UFO
Welcome to the world of UFO, a groundbreaking user interface-focused multi-agent framework aimed at transforming user interactions with the Windows operating system. UFO enables users to seamlessly navigate and operate within individual applications or across multiple applications to fulfill various requests swiftly and efficiently.
Framework Overview
UFO operates through a well-structured multi-agent framework encompassing:
-
HostAgent 🤖: A decision-maker for selecting applications to process user requests. When tasks require multiple applications, HostAgent may switch applications, ensuring a smooth and successful task conclusion.
-
AppAgent 👾: Dedicated to executing actions iteratively within chosen applications, completing tasks efficiently in each respective app.
-
Application Automator 🎮: It translates the actions from HostAgent and AppAgent into tangible user interactions, utilizing UI controls, native APIs, and AI tools for coherent application management.
Both agents employ the multi-modal functionalities of GPT-Vision to deliver an intuitive and reliable user experience.
Latest Updates and News
-
September 8, 2024: Release of version 1.1.0 enhances UFO's interaction abilities, permitting clicks on any app region and cutting down latency by around one-third.
-
July 6, 2024: Version 1.0.0 was launched, unveiling new features awaited by our community. Feedback and contributions are warmly encouraged.
-
June 28, 2024: Our official introduction video is available to viewers for a walkthrough of UFO's capabilities on YouTube.
Media Coverage
UFO has captured the attention of several media outlets, reflecting its transformative potential:
- Microsoft describes UFO as revamping user interfaces for a smarter Windows experience.
- Discussions surround how UFO represents the future of AI-driven PC interactions.
- Multiple international publications have covered its innovative contributions to smart automation.
Key Highlights
-
Pioneering Windows Agent: UFO stands as an innovative framework, converting natural language user requests into actionable operations on Windows.
-
RAG-Enhanced Expertise: By integrating retrieval augmented generation from diverse sources, UFO transforms into an expert across various applications.
-
Comprehensive Skill Set: UFO supports a wide array of automation inputs, encompassing mouse, keyboard, and native APIs.
-
Interactive and Customizable: Facilitates multiple sub-requests within the same session, and users can customize their agents for tailored interaction.
Getting Started
Installation
Ensure Python 3.10 or later is installed on a Windows OS 10 or above. Use the following commands to set up UFO:
# Clone repository
git clone https://github.com/microsoft/UFO.git
cd UFO
# Install the requirements
pip install -r requirements.txt
Configure Language Models (LLMs)
Set up the language model configurations in ufo/config/config.yaml
, defining parameters for both HostAgent and AppAgent.
Optional RAG Setup
Enhance UFO's capabilities through retrieval augmented generation using external databases as specified in ufo/config/config.yaml
.
Start UFO
To initiate UFO:
python -m ufo --task <your_task_name>
Interaction is facilitated through a command-line interface, ensuring active applications are ready for interaction.
Evaluation
For a detailed analysis of UFO's capabilities, refer to the WindowsBench evaluation in the Appendix of our technical report.
Citation
If UFO contributes to your research, please refer to our technical report for proper citation.
Additional Notes
UFO represents a significant leap in user interface automation, offering a scalable and customizable framework that leverages cutting-edge AI technologies to optimize user experience in Windows OS environments.