ComfyUI-YoloWorld-EfficientSAM: Unveiling Object Detection and Segmentation Capabilities
The ComfyUI-YoloWorld-EfficientSAM project presents an unofficial yet robust implementation of the YOLO-World and EfficientSAM models, enhancing object detection and segmentation processes. This amalgamation of technologies promises a higher efficiency level, suitable for both image and video processing.
Overview of the Project
ComfyUI-YoloWorld-EfficientSAM builds on the existing YOLO-World framework and EfficientSAM model to empower users with advanced object detection and segmentation capabilities. With the release of version V2.0, the project has introduced functionalities like mask separation and extraction, allowing users to isolate and output specific masks. Both image and video formats are supported, marking an evolutionary step from the previous V1.0 version.
Key Features
Model Loading
-
YOLO-World Model Loader: This tool seamlessly integrates three official models—yolo_world/l, yolo_world/m, and yolo_world/s—downloading and loading them automatically for user convenience.
-
EfficientSAM Model Loader: Users can choose to leverage either CUDA or CPU according to their available resources and desired performance parameters.
Detection and Segmentation
-
YOLO World ESAM: This feature allows users to input YOLO-World and EfficientSAM models into their workflows. Users can connect images, specify detection categories, adjust confidence thresholds, and tweak IoU thresholds for a tailored detection and segmentation experience. Options include adjusting detection box thickness and text display properties, as well as deciding whether to display object confidence scores or segment images using EfficientSAM.
-
Mask Options: Users have the flexibility to combine masks into a single output image or extract selected mask indices for individual output.
Additional Detection Features
Through collaborations, such as those with ltdrdata, new opportunities for enhanced object detection are available, including the integration of the Impact-Pack, further enriching the capabilities with class-agnostic non-maximum suppression to remove bounding box overlaps effectively.
Installation Guide
To install the ComfyUI-YoloWorld-EfficientSAM project, users can follow these steps:
- Via ComfyUI Manager (recommended): This option is currently in development.
- Manual Installation:
- Navigate to the
custom_nodes
directory. - Run the command
git clone https://github.com/ZHO-ZHO-ZHO/ComfyUI-YoloWorld-EfficientSAM
. - Change to the newly created directory and execute
pip install -r requirements.txt
. - Restart ComfyUI for changes to take effect.
- For model operation, download
efficient_sam_s_cpu.jit
andefficient_sam_s_gpu.jit
files from EfficientSAM and place them in the appropriate directory.
- Navigate to the
Workflow Implementations
- V2.0 Workflows:
- Image Detection + Segmentation: Enhanced for precision and control.
- Video Detection + Segmentation: Enables seamless analysis of video files.
- V1.0 Workflows: Not compatible with V2.0 but provided for historical reference, offering similar capabilities for images and videos.
Timeline of Updates
- On February 24, 2024, version V2.0 was released, introducing the mask separation and extraction functionalities and expanding both image and video processing capabilities.
- Previous updates include the addition of new detection nodes and project inception details.
Support and Contribution
For those interested in supporting or following the project, various contact avenues are available, including email and social media platforms like Bilibili and Twitter. Contributions to the project include a robust implementation referenced from original works such as YOLO-World, EfficientSAM, with acknowledgments to contributors and inspirational sources.
This project stands as a testament to collaborative development, harnessing the power of existing technologies to push the boundaries of object detection and segmentation solutions.