Project Icon

SeeAct

Streamlined Web Task Performance Using Large-Scale Multimodal AI Models

Product DescriptionDiscover a pioneering system for automating web tasks with extensive multimodal models, including GPT-4V(ision). This framework includes a strong codebase for autonomous web agents on live websites. Features include Playwright integration, a variety of grounding strategies, and compatibility with language models like OpenAI GPT-4 and Google Gemini. Regular updates enhance functionality, with recent features such as Crawler mode and Multimodal-Mind2Web dataset support. SeeAct seeks to efficientize web interactions through advanced AI.
Project Details