#computer vision
CLIP
CLIP employs contrastive language-image pre-training to achieve zero-shot prediction, matching performance with labeled data models. By integrating with PyTorch and TorchVision, CLIP facilitates diverse tasks like CIFAR-100 predictions and linear-probe evaluations through its image and text encoding capabilities.
gluon-cv
GluonCV offers a comprehensive set of state-of-the-art deep learning models for a variety of computer vision tasks such as image classification and object detection. It facilitates quick prototyping for engineers, researchers, and students by supporting both PyTorch and MXNet frameworks. Key features include scripts for reproducing research results, an extensive collection of pre-trained models, and user-friendly APIs to ease implementation. Ideal for both research and production, GluonCV is well-supported by the community and integrates seamlessly with AutoGluon for enhanced model deployment.
U-2-Net
U²-Net is a deep learning model engineered for salient object detection utilizing a distinctive nested U-structure for enhanced segmentation accuracy. Awarded the 2020 Pattern Recognition BEST PAPER AWARD, U²-Net finds applications across mobile image processing and creative design tools. This model is accessible on platforms such as PlayTorch, ensuring integration on Android and iOS devices. It supports diverse functionalities like background removal and portrait creation, making it a flexible tool for developers and artists focusing on precise object detection.
myvision
Explore MyVision, an online tool for image annotation that facilitates computer vision tasks by enhancing productivity through features like bounding box and polygon drawing, advanced polygon manipulation, and automatic annotation via pre-trained ML models. It ensures secure, local data processing and supports easy import or conversion of datasets. Available in English and Mandarin, MyVision offers a seamless labeling experience with no setup required, ideal for creating high-quality machine learning training data.
fvcore
fvcore is a compact library from FAIR's computer vision team, offering essential utilities for frameworks like Detectron2, PySlowFast, and ClassyVision. It includes PyTorch layers and tools for flop counting, parameter counting, BatchNorm statistics recalibration, and hyperparameter scheduling. Installation can be easily done through PyPI, Anaconda Cloud, GitHub, or a local clone. With support for Python 3.6 and above, it offers type-annotated, tested, and benchmarked components for reliable performance.
sports
Discover innovative tools in sports analytics within this repository, which includes object detection, image segmentation, and keypoint detection. Contributions are encouraged to tackle challenges like ball tracking and player re-identification. The repository supports Python environments and uses open-source opportunities to advance player dynamics analysis. Access sports datasets on Roboflow Universe to collaboratively improve sports analytics technology.
YOLOMagic
YOLO Magic extends the YOLOv5 framework with advanced network modules and a user-friendly web interface for enhanced visual task performance. It includes spatial pyramid modules, feature fusion structures, and new backbone networks to improve efficiency. Suitable for both beginners and experts, it streamlines image inference and model processes. The active community offers extensive resources for customization and learning. Explore YOLO Magic for top-tier object detection and analysis.
VMamba
VMamba presents an innovative Visual State Space (VSS) model designed to integrate the Mamba state-space language model into a computationally efficient vision backbone. Utilizing the 2D Selective Scan (SS2D) module, VMamba facilitates contextual information gathering from both one-dimensional and two-dimensional data. It excels in visual perception tasks, offering enhanced input scaling efficiency on recognized benchmarks. Recent updates optimize the code for readability and introduce mamba2 support, with the model recently featured at NeurIPS2024.
studio-lab-examples
Discover example Jupyter notebooks demonstrating how to set up AI/ML environments with SageMaker Studio Lab. This repository guides data scientists in areas such as computer vision and NLP, offering insights into project deployment with Amazon SageMaker. Explore community-driven content on geospatial data science and generative AI, and access diverse programming environments.
CVPR2024-Paper-Code-Interpretation
Explore a comprehensive collection of CVPR papers ranging from 2017 to 2024, featuring downloads, coding resources, interpretations, and live technical sessions. This page also provides summaries of top papers from 2000 to 2021 and detailed insights into specialized fields like object detection, image processing, and face recognition. Discover valuable links for further resources and stay updated with the latest CVPR 2023 and 2024 publications, offering a deep exploration of the forefront developments in computer vision.
skyvern
Skyvern is a tool that automates browsing tasks using LLMs and computer vision, efficiently managing workflows on various websites. It addresses the limitations of traditional scripting by dynamically adapting to changes without the need for customized code. Through its agents, Skyvern can navigate, extract data, manage credentials, and complete forms, facilitating complex tasks such as competitive analysis and product comparison. The cloud version includes anti-bot mechanisms and CAPTCHA solutions, providing streamlined automation tailored to numerous workflow requirements.
lightly
LightlySSL is a flexible framework focusing on self-supervised learning in computer vision, compatible with PyTorch and PyTorch Lightning for modular and distributed training setups. Its features include support for multiple backbones, facilitating various tasks like embedding and segmentation. Enhanced capabilities are available with the commercial version, including Docker support, ideal for professionals exploring models such as AIM, MoCo, and SimCLR. Join the active learning and data curation platform at lightly.ai for broader engagement.
CCTag
This project provides a library for detecting and localizing CCTag markers formed by concentric circles, using both CPU and GPU technologies. Based on research from the CVPR 2016 conference, it is designed to operate under challenging conditions and requires CUDA compatibility. Offering continuous integration across Windows and Linux, it ensures updated builds and smooth integration. Resources such as printable markers and comprehensive documentation are available for enhanced deployment. Developed through the European Union’s Horizon 2020 program, CCTag is licensed under MPL v2.
mmcv
MMCV provides essential tools for computer vision research, featuring capabilities such as image and video processing, annotation, visualization, and various CNN supports. Excelling in high-performance CPU and CUDA operations, it supports Linux, Windows, and macOS. Compatible with PyTorch 1.8 to 2.0 and requires Python 3.7+. Version 2.0.0 introduces data transformation modules and a new naming scheme for easier use. Choose 'mmcv' for a full suite or 'mmcv-lite' for a streamlined experience. This library is designed to satisfy the requirements of researchers in deep learning model training.
keras-cv
KerasCV provides a collection of modular computer vision components that integrate effortlessly with TensorFlow, JAX, and PyTorch, leveraging Keras 3. It facilitates tasks such as data augmentation, object detection, and segmentation, helping engineers swiftly construct advanced training and inference pipelines. This library assures framework compatibility, allowing for reuse without expensive migration processes. It also encourages community contributions to further enhance numerical performance and expand computer vision applications.
awesome-demos
Discover a wide range of Gradio-powered demos spanning natural language processing, computer vision, data manipulation, and scientific fields. Featuring real-world applications like text-to-image conversion, multilingual summarization, and sentiment analysis in Turkish, explore how Gradio facilitates the creation of interactive models with its robust functionalities. Gain insights into potential project enhancements and innovations.
scenic
Scenic offers a robust framework for creating attention-based computer vision models, supporting tasks like classification and segmentation across multiple modalities. Utilizing JAX and Flax, it simplifies large-scale training through efficient pipelines and established baselines, ideal for research. Explore projects with state-of-the-art models like ViViT. Scenic provides adaptable solutions for both newcomers and experts, facilitating easy integration into existing workflows.
trainbot
Utilize video4linux USB and Raspberry Pi camera modules for efficient train detection and image stitching. This project minimizes complexity with straightforward computer vision methods, suitable for Raspberry Pi 4. Detailed deployment options ensure flexible use with minimal requirements, offering accessibility to those familiar with basic system administration and Go programming.
Best_AI_paper_2020
Delve into the pivotal AI breakthroughs of 2020 through an extensive compilation of key research papers. Gain insight into advancements such as ethical AI considerations, image creation, code translation, and video repair. Each entry in this curated list is enriched with video summaries, code examples, and detailed article links, offering a vital resource for both enthusiasts and professionals eager to track the evolution of AI.
cvat
CVAT is an interactive annotation solution used globally for video and image processing in computer vision projects. It integrates seamlessly with Data-centric AI, supporting a wide range of annotation formats such as COCO and YOLO, and offers automatic labeling. This enhances productivity significantly. CVAT is available as a free online service or can be self-hosted, catering to diverse annotation needs. With integrations like Roboflow and HuggingFace, along with extensive user support and training, CVAT is essential for automating and streamlining the annotation process, helping organizations and developers effectively tackle real-world challenges.
gocv
The GoCV package provides OpenCV 4 support for Go developers on Linux, macOS, and Windows, enabling efficient image and video processing with hardware acceleration via CUDA for Nvidia GPUs and Intel OpenVINO support. It includes examples and installation guides to streamline integration and leverage the latest OpenCV capabilities. The package is designed to be compatible with the newest Go releases, offering a reliable solution for developers looking to implement high-performance computer vision applications using Go, without unnecessary promotional language.
raster-vision
Raster Vision is an open-source Python framework that facilitates creating computer vision models for satellite and aerial imagery. It supports functions like chip classification, object detection, and semantic segmentation and provides cloud execution using AWS Batch and Sagemaker. This tool offers a low-code solution for comprehensive geospatial deep learning workflows, including data processing, model training, and result generation. It fits both novices and experts, offering installation via pip or Docker for flexible application.
Anti-UAV
This project presents a solution for the detection and tracking of Unmanned Aerial Vehicles (UAVs) using both PyTorch and Jittor frameworks. It meets the need for reliable UAV monitoring due to their expanding applications. The project offers a unique dataset with high-quality video sequences in RGB and Thermal Infrared (IR) formats. Newly released Jittor versions enhance compatibility with domestic hardware, aiming to address needs in security and defense. Licensed under MIT, the Anti-UAV project provides detailed evaluation metrics and comprehensive training resources.
overeasy
Overeasy enables the creation of custom computer vision solutions with zero-shot models, supporting tasks like bounding box detection, classification, and segmentation without extensive datasets. The tool offers easy installation and features robust agents and execution graphs to facilitate the management and visualization of image processing workflows.
graphics
TensorFlow Graphics seamlessly integrates differentiable graphics layers into neural networks, emphasizing efficient training with geometric constraints. It supports self-supervised learning by combining computer vision and graphics techniques to utilize unlabelled data. The project provides graphics layers, 3D viewing capabilities, and full compatibility with TensorFlow. Users can access detailed tutorials on a range of 3D tasks, such as object pose estimation and spherical harmonics, offering a valuable tool for enhancing machine learning models' 3D understanding.
imageprocessing-labs
This resource details computer vision and machine learning technologies including Fast Fourier Transforms, stereo matching, and Poisson image editing. It covers methods such as fish-eye transforms, decision tree learning, and clustering techniques. Suitable for web and Node environments, the project offers hands-on experience with neural networks, gradient boosting, and 3D shape drawing using WebGL and ONNX Runtime.
jaxlie
jaxlie offers efficient Lie groups for JAX-based rigid body transformations in computer vision and robotics. Inspired by the Sophus C++ library, it includes high-level classes like SO2, SE2, SO3, and SE3. These classes support essential operations such as exp(), log(), and multiply(), designed for forward- and reverse-mode automatic differentiation. Users benefit from manifold optimization, compatibility with JAX transformations, and support for broadcasting. Additionally, it provides utilities for uniform random sampling and Euler angle conversions, making it versatile for pose graph optimization and radiance field construction in projects like jaxfg and tensorf-jax.
VideoPipe
VideoPipe is a C++ framework for video analysis across platforms, offering easy plugin integration and support for protocols like RTSP and RTMP. Suited for face recognition and traffic analysis, it provides flexible configuration with minimal dependencies and supports inference backends like OpenCV and TensorRT.
yolov5
Discover cutting-edge vision AI techniques driving visual intelligence forward. Leveraging extensive research and refined practices, this project excels in object detection, image segmentation, and classification. Access detailed resources and guides while a vibrant user community aids in optimizing AI potential across various fields. Connect via GitHub for issues or join community discussions on Discord to utilize top-tier AI tools effectively.
ModelAssistant
Discover how to deploy state-of-the-art AI algorithms on economical hardware using the open-source platform from Seeed Studio. ModelAssistant helps developers train and visualize AI models efficiently on microcontrollers and SBCs while optimizing performance and energy consumption. Address practical applications like anomaly detection and computer vision with support for formats such as TensorFlow Lite and ONNX. Stay informed about updates including YOLO-World and MobileNetV4 for embedded devices. Easily integrate AI with SSCMA using pre-trained models and intuitive tools.
pylabel
PyLabel is a Python package that assists in preparing image datasets for computer vision models like PyTorch and YOLOv5. It offers efficient conversion of annotation formats such as COCO to YOLO with minimal code. Users can analyze image datasets and strategically split them into training, test, and validation groups. Furthermore, PyLabel includes a Jupyter notebook-based tool supporting both manual and AI-assisted image labeling. It also allows easy visualization to verify annotations, developed as a project at UC Berkeley.
dinov2
DINOv2 is designed for robust visual feature extraction using unsupervised learning on a dataset of 142 million images. Its features work effortlessly with simple classifiers like linear layers, performing well in diverse computer vision tasks without the need for fine-tuning. With its integration of registers in Vision Transformers, DINOv2 offers improved performance, showcasing the latest advancements in the field. Available in multiple configurations via PyTorch Hub, it supports applications in image classification, depth estimation, and semantic segmentation. Discover how DINOv2's pretrained models enhance visual feature robustness and versatility.
inference
The Inference platform facilitates deploying computer vision models, providing tools for object detection, classification, and segmentation. It includes support for foundational models such as CLIP, Segment Anything, and YOLO-World. Available as a Python-native package, a self-hosted server, or through an API, it is compatible with Python 3.8 to 3.11 and supports CUDA for GPU acceleration. With minimal dependencies and model-specific installations, it enhances performance. The Inference SDK supports local model execution with minimal code, handling various image input formats. Explore advanced features, an inference server via Docker, and comprehensive documentation for optimal utility.
vision
Torchvision offers a robust set of tools for computer vision, including datasets, model architectures, and image transformations. It supports multiple image backends like torch tensors and PIL images and provides video processing options via pyav and video_reader. Additionally, it includes C++ compatible models with a note on version stability. Torchvision focuses on efficiency and ease of use, integrating seamlessly with the PyTorch ecosystem.
assets
This repository provides a complete suite of visual assets, pre-trained models, and curated datasets that integrate smoothly with the Ultralytics YOLO ecosystem. It offers essential tools for object detection, image classification, among others, suitable for both personal and commercial applications. Users can easily download pre-trained models to perform inference with minimal effort. Additionally, it features a wide range of visual assets and datasets to facilitate diverse machine learning projects. With comprehensive documentation and varied licensing options, the repository is designed to support both hobbyists and professionals in advancing their computer vision capabilities.
InstructCV
InstructCV utilizes advancements in text-to-image diffusion to streamline computer vision tasks, such as segmentation and classification. It simplifies execution through a natural language interface, transforming tasks into text-to-image problems. Using diverse datasets, it employs instruction-tuning to enhance task performance, serving as an instruction-guided vision learner.
Feedback Email: [email protected]