#Computer Vision

Logo of supervision
supervision
Explore a comprehensive Python package designed for efficient computer vision applications, offering tools for easy dataset management, drawing detections, and zone counting. This model-agnostic package is compatible with YOLO and other popular models, featuring connectors for widely-used libraries. Customizable annotators and utilities support various formats, such as YOLO, Pascal VOC, and COCO, making it ideal for developers seeking streamlined solutions for classification, detection, and segmentation. Benefit from thoroughly detailed tutorials and practical examples, and become part of the open-source community driving advancements in computer vision.
Logo of Awesome-PyTorch-Chinese
Awesome-PyTorch-Chinese
Explore a detailed guide to PyTorch with tutorials, video lessons, and suggested readings. Discover practical applications in NLP and computer vision using a variety of PyTorch repositories. This resource caters to learners of all levels, providing comprehensive support from foundational neural network concepts to advanced model training techniques.
Logo of learnopencv
learnopencv
This extensive collection is a valuable resource for those interested in computer vision and AI. It provides accompanying code and insightful articles on various topics such as NLP, brain tumor segmentation, and more. Ideal for research and development, it includes practical examples in areas like LiDAR SLAM and autonomous driving, offering a seamless blend of theory and practical application.
Logo of opencv
opencv
OpenCV provides a wide range of open-source tools focused on computer vision and AI. It offers comprehensive documentation, active forums, and encourages community contributions under clear guidelines. The platform extends its capabilities through the opencv_contrib package and offers educational courses. Resources are tailored for practitioners ranging from novices to experts to enhance their computer vision skills, promoting community interaction and project showcasing.
Logo of Awesome-pytorch-list
Awesome-pytorch-list
Discover a vast array of PyTorch libraries and tutorials focused on NLP, CV, and probabilistic models. This curation serves researchers and developers with tools for neural networks, paper implementations, and improving model interpretability, utilizing PyTorch's GPU support and extensive library resources.
Logo of CVPR-2023-24-Papers
CVPR-2023-24-Papers
Access an extensive selection of research papers from CVPR 2024, showcasing the forefront of computer vision and deep learning. The repository offers code implementations to explore advancements in visual intelligence. Stay informed on the latest developments in image synthesis, 3D modeling, and more from top researchers worldwide.
Logo of notebooks
notebooks
Access a wide array of tutorials on leading computer vision models and methodologies. The repository includes education on models from ResNet and YOLO to more sophisticated types like DETR, Grounded DINO, SAM, and GPT-4 Vision. Gain knowledge on model fine-tuning, instance segmentation, and using object-detection transformers. Through thorough tutorials and practical examples, this resource supports both novices and experts in enhancing their skills across various vision tasks, leveraging platforms like Colab and Kaggle. Empower your computer vision initiatives with comprehensive tutorials and experiments available here.
Logo of ML-ProjectKart
ML-ProjectKart
Explore a diverse collection of open-source machine learning projects, crafted to enhance expertise in ML, deep learning, computer vision, and natural language processing. This repository includes projects suitable for beginners and advanced users, ideal for mastering algorithms and model construction. Engage with a vibrant community, follow detailed contribution guidelines, and collaborate on projects such as advertisement prediction, air quality indexing, and brain tumor detection. ML-ProjectKart serves as a valuable resource for advancing in the ML/AI field.
Logo of ICCV2023-Paper-Code-Interpretation
ICCV2023-Paper-Code-Interpretation
This repository features a detailed collection of ICCV conference papers and code resources from 1987 to 2023. It provides organized summaries of significant ICCV papers, complete with download options for ease of access. Readers can explore current interpretations and presentations of ICCV2023 papers, with ongoing updates for fresh insights. The repository also includes categorized summaries and links for ICCV2021 and ICCV2019, ensuring comprehensive access to papers and code for researchers and enthusiasts. Regular updates make this resource essential for exploring the evolution of computer vision advancements at ICCV.
Logo of Mamba-in-CV
Mamba-in-CV
This collection of Mamba-focused computer vision projects highlights recent developments including human activity recognition, anomaly detection, and autonomous driving. It explores the capabilities of state space models as alternatives to transformers, providing links to detailed papers and code. Ideal for researchers and practitioners interested in visual state space models.
Logo of ComputerVisionPractice
ComputerVisionPractice
Discover practical image processing techniques using OpenCV, covering basics like arithmetic operations and thresholding, to advanced applications including OCR recognition and geometric transformations. Gain insights into VisionPro and explore a range of examples detailed with blog references for a thorough understanding of image processing in both theory and practice.
Logo of DLTA-AI
DLTA-AI
DLTA-AI facilitates advanced data annotation and object tracking with seamless integration of leading Computer Vision models, including Meta's Segment Anything. It supports comprehensive model selection and provides robust editing tools, allowing for precise video and image annotations. Export options include common formats like COCO and MOT, and custom formats for project-based flexibility. Ideal for applications seeking efficient AI-assisted data labeling and processing.
Logo of Parameter-Efficient-Transfer-Learning-Benchmark
Parameter-Efficient-Transfer-Learning-Benchmark
Investigate a benchmark for parameter-efficient transfer learning in computer vision, assessing 25 leading algorithms on 30 varied datasets. The platform provides a modular codebase for comprehensive analysis in image recognition, video action recognition, and dense prediction. Pre-trained models like ViT and Swin are used to attain high performance with fewer parameters. The benchmark facilitates easy evaluation and continuous updates for new PETL methods and applications.
Logo of pytorch-grad-cam
pytorch-grad-cam
Explore state-of-the-art methods for AI explainability in computer vision, including advanced Pixel Attribution and benchmarking tools. Supports diverse CNNs and Vision Transformers across use cases like classification and segmentation, with methods like GradCAM for enhanced visualization and interpretability metrics.
Logo of awesome-data-labeling
awesome-data-labeling
Discover a well-curated selection of data labeling tools tailored for a variety of domains including images, text, audio, video, 3D, and more. These tools offer distinct functionalities that boost the efficiency of data annotation processes, essential for machine learning projects. Notable tools such as labelImg and CVAT make image annotation easier, while YEDDA and ML-Annotate are ideal for text labeling tasks. Tools like EchoML and UltimateLabeling enhance audio and video annotations. This collection also addresses needs for Lidar, time series, and multidomain data labeling, providing critical resources for building robust AI solutions.
Logo of kornia
kornia
Discover differentiable image processing and AI model integration with Kornia, a library seamlessly built on PyTorch. Benefit from GPU acceleration for efficient batch transformations and sophisticated data augmentation. Kornia supports various vision tasks, offering tools for image processing, transformation, and augmentation along with pre-trained models for tasks like face detection, segmentation, and classification.
Logo of computervision-recipes
computervision-recipes
This repository provides comprehensive examples and best practices for building computer vision systems, utilizing cutting-edge algorithms and neural network architectures. It includes tools to improve, assess, and scale models by incorporating advanced libraries, thus aiding data scientists and machine learning engineers in expediting real-world projects. The focus is on efficient time-to-market strategies, supporting various tasks such as image classification and action recognition with deployment capabilities on cloud services like Azure, using Jupyter notebooks and PyTorch for demonstration.
Logo of CVPR2024-Papers-with-Code
CVPR2024-Papers-with-Code
Access a wide range of CVPR 2024's notable papers and open-source projects on OpenReview, covering various areas such as 3D modeling, AI advancements, and multimodal learning. Connect with a global network of experts to keep abreast of the latest in computer vision and related tech.
Logo of Savant
Savant
Savant is an advanced open-source framework tailored for developing real-time, highly efficient multimedia AI applications across Nvidia platforms. It excels in building fault-tolerant inference pipelines, offering flexibility and scalability for both data centers and edge devices. Built on Nvidia's DeepStream, Savant provides an intuitive abstraction layer to simplify the creation of dynamic computer vision and video analytics pipelines without requiring low-level programming. Savant supports various Nvidia hardware, including Jetson and high-end GPUs, and features seamless integration with cloud environments. Additional capabilities include OpenTelemetry for pipeline monitoring, and support for high-performance data handling through Python-based SDKs and Prometheus instrumentation. Savant allows developers to reduce time to market and streamline the development process for diverse applications.
Logo of computer-vision-in-action
computer-vision-in-action
Explore the comprehensive world of computer vision, bridging theory and practice with foundational knowledge and advanced neural network models. Engage in practical projects with detailed guidance and code implementation, and utilize the importable L0CV package for hands-on learning. Discover insights into trending models such as Transformers and Attention, enhancing both mathematical comprehension and practical engineering skills, tailored for those eager to advance in computer vision.
Logo of Transformer-in-Vision
Transformer-in-Vision
This project examines the growing significance of Transformer technology in a variety of computer vision applications. It serves as a comprehensive resource, aggregating recent studies from fields like robotics and image processing, and highlighting the essential role of Transformers in AI models. The project outlines innovations such as LLM-in-Vision, and delivers thorough surveys on complex topics, such as multi-modal pre-training and generative adversarial networks, providing readers with insights into this evolving field.
Logo of annotated_research_papers
annotated_research_papers
Delve into an extensive library of annotated research papers meant for easier comprehension, primarily aimed at machine learning professionals. This effort strives to demystify complex research through concise annotations and insightful analysis. Featuring a curated collection of significant papers across fields like Computer Vision, NLP, and Diffusion Models, this resource supports professionals in staying current with industry advancements and enriching their learning journey.
Logo of awesome-nerf-editing
awesome-nerf-editing
This collection offers insights into the advancements in Radiance Fields, particularly NeRF and 3D Gaussian Splatting, serving as a comprehensive guide for mastering 3D editing techniques. Access key papers, relevant surveys, and the latest research developments, providing a thorough understanding of radiance field-based 3D editing. Connect with the research community through continuous updates and collaborative openings, simplifying your journey in this evolving discipline.
Logo of menpo
menpo
Menpo offers a streamlined approach for importing, manipulating, and visualizing annotated image and mesh data, crucial for Machine Learning and Computer Vision applications. The package's 'Landmarkable' core types facilitate efficient image tasks like masking and cropping. Compatible with various Python versions, Menpo is best installed via the conda ecosystem, ensuring seamless integration with SciPy and Numpy. Dive into Menpo's capabilities through comprehensive Jupyter Notebooks and explore its specialized libraries, such as menpofit for deformable modeling and menpo3d for 3D mesh processing.
Logo of ffcv
ffcv
Experience enhanced model training efficiency with a system designed to accelerate data handling for both neural networks and deep learning tasks. By substituting conventional data loaders, this system significantly reduces time and cost in training models on prominent datasets like ImageNet and CIFAR-10, while providing seamless integration into existing workflows. Suitable for a range of applications from small to large scale, it optimizes automated data processes and loading strategies across different storage solutions. Also, it efficiently supports multiple model training on individual GPUs, making it perfect for environments with limited resources.
Logo of diffae
diffae
This project introduces diffusion autoencoders that focus on meaningful and decodable image representation. Featured in CVPR 2022, it offers practical tools like Colab walkthroughs and web demos for sample generation, manipulation, and interpolation. The comprehensive documentation and LMDB datasets support ease of use. It also provides training and evaluation scripts for datasets such as FFHQ and CelebAHQ, facilitating advancements in AI image processing, and supplying essential tools for researchers and developers.
Logo of PyTorch-Tutorial-2nd
PyTorch-Tutorial-2nd
Discover extensive deep learning applications and inference deployment frameworks in this updated resource. This tutorial builds upon the first edition, offering foundational concepts and guiding from basic knowledge to industry applications in computer vision, NLP, and large language models. It details PyTorch fundamentals and projects covering image processing, text generation, and model deployment with ONNX and TensorRT, allowing learners to apply theory in practice. Designed for AI learners, students, and professionals aiming to extend their understanding and practical skills in PyTorch.
Logo of 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code
500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code
Explore a vast collection of over 500 projects in AI, machine learning, deep learning, computer vision, and natural language processing, all complete with source code. Continuously updated and tested, each project offers valuable insights into practical implementations and the latest advancements. Ideal for professionals aiming to enhance skills or engage in open-source contributions, this repository covers diverse domains like Python projects, healthcare, data analysis, and more. Discover projects ranging from basic to advanced levels, providing substantial learning opportunities and hands-on experience with real-world applicability. Contributions from numerous developers ensure a rich and expansive resource perfect for those interested in innovative AI development.
Logo of daily-paper-computer-vision
daily-paper-computer-vision
The project comprises daily updates on recent studies in computer vision, AI, and related disciplines, compiling a comprehensive repository of high-caliber papers from renowned conferences such as CVPR, IJCAI, and ICLR. The CVer community offers opportunities to explore advancements in AI applications across areas like object detection, semantic segmentation, GAN, and NeRF. Discover a multitude of studies to remain informed on cutting-edge computer vision and AI developments.
Logo of ML-YouTube-Courses
ML-YouTube-Courses
This repository features a curated selection of machine learning courses from YouTube, spanning topics like basics, deep learning, NLP, computer vision, and reinforcement learning. Compiled by DAIR.AI, it includes courses from prestigious institutions such as Caltech, Stanford, and MIT, offering educational resources for professionals and enthusiasts. It provides access to advanced courses on modern techniques and practical applications, serving both beginners and experienced learners in AI and machine learning.
Logo of awesome-self-supervised-learning
awesome-self-supervised-learning
Explore a curated compilation of self-supervised learning resources, offering theoretical insights and practical applications in fields such as computer vision, robotics, and natural language processing. Drawing inspiration from influential machine learning projects, this collection highlights self-supervised learning as an emerging trend. It includes critical papers, benchmark codes, and detailed surveys, making it an indispensable resource for researchers and practitioners interested in self-supervised methods. Contributions are encouraged through pull requests to broaden the repository's content and maintain its relevance.
Logo of voxelgpt
voxelgpt
Explore a robust tool that utilizes advanced language models and multimodal integration for seamless natural language interaction with datasets. Ideal for professionals in computer vision, machine learning, and data science, this solution facilitates efficient data filtering, sorting, and querying without the need for intricate coding. Experience its functionalities live, including dataset and computation queries, and how it integrates with the FiftyOne library to enhance data management processes.
Logo of semantic-segmentation
semantic-segmentation
Examine state-of-the-art semantic segmentation models equipped with versatile datasets in PyTorch. The project provides practical tools, seamless integration with leading backbone architectures, and accommodates various parsing tasks such as scene, human, and medical image segmentation. Future updates aim to revamp the training pipeline, deliver baseline pre-trained models, implement distributed training, and offer tutorials for custom datasets. Compatibility with ONNX and TFLite ensures widespread adaptability, serving developers who demand precision and flexibility in segmentation applications. Anticipate significant enhancements in the scheduled May 2024 release.
Logo of shared_colab_notebooks
shared_colab_notebooks
This repository contains a wide range of Google Colaboratory notebooks catering to tasks in NLP, NLG, and computer vision. It features models like T5 and DialoGPT for language processing, ViT and ConvNeXT for visual tasks, and unique applications like 3D photo inpainting. Users can also find tutorials and projects on UI/UX with GPT2, making it suitable for those researching diverse ML domains. Explore and tailor these ML projects effortlessly.
Logo of vissl
vissl
This library supports advanced self-supervised learning in computer vision using PyTorch. It offers reproducible code, comprehensive benchmarks, and a modular design, providing scalable solutions for research. Featuring models like SwAV, SimCLR, and MoCo(v2), and supporting large-scale training, VISSL helps evaluate and innovate in learning representations effectively.
Logo of ECCV2024-Papers-with-Code
ECCV2024-Papers-with-Code
Explore cutting-edge research in computer vision with our curated list of ECCV 2024 papers and open-source projects, covering areas like deep learning, 3D reconstruction, and autonomous driving. Gain access to in-depth studies and codes to support practical AI implementation and stay informed on the latest innovations in neural networks, large language models, and more.
Logo of visionscript
visionscript
VisionScript is an abstract Python-based language designed for easy computer vision tasks like object detection, classification, and segmentation. It features a concise syntax allowing rapid implementation in just a few lines of code. Suitable for newcomers, it supports REPL and interactive notebooks, integrating models such as CLIP and YOLOv8. With straightforward installation, VisionScript empowers developers to quickly engage in computer vision projects, featuring lexical inference for improved workflow efficiency.
Logo of T-Rex
T-Rex
T-Rex2 utilizes both text and visual prompts to improve object detection, enabling zero-shot detection applicable to various industries without prior labeling. Features like expanded YOLO format export enhance user accessibility, aiding dataset creation. T-Rex Label and the Count Anything APP exemplify its adaptability in handling complex industrial tasks. The project provides open API access for educators and researchers to advance their work in sectors such as agriculture, biology, and OCR. Discover the demo and API documentation for detailed application.
Logo of PaddleHub
PaddleHub
Access a wide range of AI models for computer vision, NLP, speech, and cross-modal tasks. Models are deployable with just three lines of code, compatible with Linux, Windows, and MacOS. Newest features include ERNIE-ViLG, Disco Diffusion, and Stable Diffusion. Utilize models as a service and explore resources on Hugging Face Space through interactive demos with available pre-trained open-source models.
Logo of Realtime_Multi-Person_Pose_Estimation
Realtime_Multi-Person_Pose_Estimation
This project features a bottom-up approach to real-time multi-person pose estimation, removing the need for person detectors. It achieved recognition in the 2016 MSCOCO Keypoints Challenge and the ECCV Best Demo Award. The approach is implemented across various platforms including C++, TensorFlow, and PyTorch, providing flexible options for developers. The Python code aligns with the latest MSCOCO models and suits diverse system inputs from images to webcams, leveraging deep learning for enhanced human pose recognition.
Logo of MLE-Flashcards
MLE-Flashcards
Access over 200 flashcards covering machine learning, computer vision, and deep learning essentials, designed to support interview preparation for leading tech firms. Created from academic exercises, these slides serve both seasoned professionals and newcomers. View the latest presentations for animated content. Perfect for reviewing foundational ML knowledge or gaining an overview with additional resources. Contribute feedback on GitHub to help improve this evolving tool.
Logo of best_AI_papers_2021
best_AI_papers_2021
Explore key AI developments of 2021, featuring breakthroughs with ethical focus, bias awareness, and innovative applications that enhance quality of life. This list provides insights through video summaries, detailed articles, and code repositories, giving a broad understanding of the year's AI achievements. Discover advances from OpenAI's DALL·E to innovations in computer vision and neuroprosthetics, all while considering the critical choices in AI technology implementation.
Logo of graph-based-deep-learning-literature
graph-based-deep-learning-literature
This repository contains a comprehensive collection of links to conference publications, workshops, surveys, and software focusing on graph-based deep learning. It systematically organizes materials from top-tier conferences such as NeurIPS, ICML, and CVPR by year and topic, serving as an invaluable resource for researchers and practitioners. Keep up with advancements in artificial intelligence, machine learning, and computational linguistics through this exhaustive literature repository.
Logo of Holocron
Holocron
Holocron provides state-of-the-art deep learning techniques for computer vision, seamlessly integrating with the PyTorch ecosystem. It supports tasks like image classification, object detection, and semantic segmentation using models such as ResNet, YOLO, and U-Net. Key features include enhanced PyTorch layers, a variety of vision models, and a flexible architecture. Explore reference scripts, latency benchmarks, and API deployment templates for efficient model integration, suitable for various machine learning applications.
Logo of Transformer-in-Computer-Vision
Transformer-in-Computer-Vision
Access a current compilation of significant works in Transformer-based computer vision research, spanning fields like detection, segmentation, and generative models. This archive serves the research community with a comprehensive collection of papers and code, covering varied topics from adversarial attacks to anomaly detection and few-shot learning. Designed for individuals interested in cutting-edge visual processing techniques, this resource is regularly updated with fresh insights and breakthroughs. Examine how Transformers are advancing computer vision technologies and applications.
Logo of sports
sports
Discover methods for tracking and analyzing football players with AI. This guide explores YOLOv5 and ByteTrack for real-time player tracking, YOLOv7 for 3D pose estimation, and GPT-4V for identifying team uniforms by color. Gain insights into computer vision techniques for sports analytics.
Logo of dust3r
dust3r
Explore a detailed third-party overview of the DUSt3R project, which streamlines 3D vision implementation using geometric methodologies. With features such as scalable global alignment and local feature detection, the project ensures straightforward integration with various frameworks. Access to pre-trained models through platforms like HuggingFace allows for easy setup and adaptation across environments. The installation guides and interactive demos provide practical examples, enhancing 3D point mapping and alignment capabilities, suitable for a wide range of computer vision users.
Logo of FILTER.js
FILTER.js
Discover FILTER.js, a comprehensive JavaScript library for image and video processing, utilizing HTML5 technologies such as WebGL, WebAssembly, and Web Workers. Offering an array of filters and plugins for synchronous and parallel processing with CPU and GPU support, it is versatile for both browser and Node.js applications. This framework supports custom builds with a wide selection of filters for efficient media processing, making it ideal for integrating advanced image manipulation and real-time computer vision into projects.
Logo of Awesome-Parameter-Efficient-Transfer-Learning
Awesome-Parameter-Efficient-Transfer-Learning
Examine the collection of papers on parameter-efficient transfer learning aimed at computer vision and multimodal fields. The collection focuses on methods for efficiently adapting large-scale pre-trained models, minimizing overfitting risks and storage requirements associated with comprehensive fine-tuning. Utilizing insights from NLP, the papers enhance applications in image classification, prompt learning, and multimodal tasks. This project provides a thorough overview of advancements and methodologies in optimizing transfer learning across various computational landscapes.
Logo of zero123plus
zero123plus
Zero123++ v1.2 enhances multi-view image synthesis from a single input, emphasizing 3D generation and improved handling of camera settings. With the addition of a ControlNet normal generator, it achieves better mask accuracy. The model is easy to use with tools such as torch and diffusers, making it efficient for VRAM. Available under Apache 2.0 and CC-BY-NC 4.0 licenses, it can be accessed for non-commercial purposes on Hugging Face. Discover multi-view synthesis with included scripts and demos.