#Foundation Models

Logo of Cradle
Cradle
Cradle empowers foundation models to perform complex tasks using human-like interaction with screenshots and traditional interfaces. It supports games like RDR2, Stardew Valley, and Cities: Skylines, along with software such as Chrome and Outlook. The framework utilizes modern Python and OCR technology, ensuring comprehensive environment compatibility and adaptability for new applications, delivering effective generalized control.
Logo of amazon-bedrock-workshop
amazon-bedrock-workshop
Learn to utilize foundation models with Amazon Bedrock in a developer-focused workshop offering hands-on labs in text and image generation, model customization, and knowledge bases. The workshop demonstrates practical applications to increase productivity, such as text summarization and chatbot development, and includes integration options with open-source tools like LangChain and FAISS. Suitable for use in environments including SageMaker Studio, this guide offers comprehensive insights into effective AI solution implementation.
Logo of ml-cvnets
ml-cvnets
CVNets is a dynamic computer vision toolkit designed for training various models like EfficientNet, Swin Transformer, and CLIP on tasks such as classification, detection, and segmentation. The toolkit's latest update includes features like Bytes Are All You Need and RangeAugment to boost model efficiency. Suitable for use by researchers and engineers, it offers comprehensive documentation and examples, including model conversion to CoreML.
Logo of unilm
unilm
Delve into scalable, self-supervised AI training techniques that enhance modeling across diverse tasks, languages, and modalities. The project introduces advancements in foundational model architectures such as DeepNet's scalable transformers, Magneto for versatile modeling, and X-MoE for efficiency. Explore the evolution of Multimodal Large Language Models with innovations like Kosmos and MetaLM, and examine AI applications in vision and speech through models like BEiT and WavLM. The project also includes specialized toolkits such as s2s-ft, demonstrating applications in document AI, OCR, and NMT for future-ready AI training and adaptation.
Logo of manifest
manifest
Manifest is a tool designed to simplify prompt programming with Foundation Models. It offers a lightweight package aiding model interactions via APIs, supports unified API, caching for reproducibility and cost savings, and works with models including OpenAI, AI21, Cohere, and HuggingFace. Its features support global caching, async queries, and local HuggingFace models, with capabilities for streaming responses and model pooling, making it versatile for prompt design and execution.
Logo of offsite-tuning
offsite-tuning
Offsite-Tuning presents an innovative transfer learning framework designed to enhance privacy and computational efficiency. It enables the adaptation of large-scale foundation models to specific tasks without requiring full model access, effectively addressing traditional cost and privacy concerns. A lightweight adapter and a compressed emulator are provided for local fine-tuning, maintaining accuracy while significantly improving speed and reducing memory usage. This approach is validated on various large language and vision models, providing a practical solution for environments prioritizing privacy and resource constraints.
Logo of OpenAdapt
OpenAdapt
OpenAdapt, an open-source tool, seamlessly integrates large multimodal models with desktop and web GUIs to automate repetitive workflows. Utilizing AI, it efficiently records, analyzes, and facilitates user interactions, significantly reducing manual effort. Its model-agnostic design ensures comprehensive compatibility across various GUIs, promoting effective process automation.
Logo of Awesome-Remote-Sensing-Foundation-Models
Awesome-Remote-Sensing-Foundation-Models
This repository delivers a detailed set of resources including papers, datasets, benchmarks, code, and pre-trained weights dedicated to Remote Sensing Foundation Models (RSFMs). It systematically categorizes models into types like vision, vision-language, and generative, offering valuable developments such as PANGAEA, TEOChat, and SAR-JEPA. Designed for a neutral exploration, it aids in navigating through model types and associated projects, maintaining up-to-date information on significant research progress in journals like ICCV and NeurIPS. This collection serves professionals seeking an enhanced understanding of RSFMs through focuses on geographical knowledge, self-supervised learning, and multimodal fusion.
Logo of Awesome-Reasoning-Foundation-Models
Awesome-Reasoning-Foundation-Models
Explore a curated repository of foundation models designed to improve reasoning in language, vision, and multimodal contexts. The database classifies models and outlines their use in commonsense, mathematical, logical reasoning, and more. Additionally, it covers reasoning techniques like pre-training and fine-tuning. Contributions are welcome to broaden the resource collection for AI reasoning advances.
Logo of Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
This literature review examines the development of recommender systems with a focus on foundational models that do not rely on explicit ID features. It discusses the potential for these systems to evolve independently, akin to foundational models in natural language processing and computer vision, and the ongoing debate regarding the necessity of ID embeddings. The review further explores how Large Language Models (LLMs) may transform recommender systems by shifting focus from matching to generative paradigms. Additionally, it highlights advancements in multimodal and transferable recommender systems, offering insights from empirical research into universal user representation. This review serves as a comprehensive guide to understanding current trends and future directions in the field of recommender systems.
Logo of Segment-Any-Anomaly
Segment-Any-Anomaly
Explore a new approach to zero-shot anomaly segmentation without additional training through hybrid prompt regularization combined with existing foundation models. Improve anomaly detection using models like Grounding DINO and Segment Anything. This repository features user-friendly demos available on Colab and Huggingface, showcasing the efficacy of the SAA+ framework on datasets such as MVTec-AD, VisA, KSDD2, and MTD. SAA+ provides optimal anomaly identification with minimal setup, catering to computer vision researchers and developers. Discover recent advancements and the work that led to success at the VAND workshop.