ComfyUI-YoloWorld-EfficientSAM - Comprehensive Object Detection and Mask Segmentation Using YOLO-World and EfficientSAM in ComfyUI

ComfyUI-YoloWorld-EfficientSAM: Unveiling Object Detection and Segmentation Capabilities

The ComfyUI-YoloWorld-EfficientSAM project presents an unofficial yet robust implementation of the YOLO-World and EfficientSAM models, enhancing object detection and segmentation processes. This amalgamation of technologies promises a higher efficiency level, suitable for both image and video processing.

Overview of the Project

ComfyUI-YoloWorld-EfficientSAM builds on the existing YOLO-World framework and EfficientSAM model to empower users with advanced object detection and segmentation capabilities. With the release of version V2.0, the project has introduced functionalities like mask separation and extraction, allowing users to isolate and output specific masks. Both image and video formats are supported, marking an evolutionary step from the previous V1.0 version.

Key Features

Model Loading

YOLO-World Model Loader: This tool seamlessly integrates three official models—yolo_world/l, yolo_world/m, and yolo_world/s—downloading and loading them automatically for user convenience.
EfficientSAM Model Loader: Users can choose to leverage either CUDA or CPU according to their available resources and desired performance parameters.

Detection and Segmentation

YOLO World ESAM: This feature allows users to input YOLO-World and EfficientSAM models into their workflows. Users can connect images, specify detection categories, adjust confidence thresholds, and tweak IoU thresholds for a tailored detection and segmentation experience. Options include adjusting detection box thickness and text display properties, as well as deciding whether to display object confidence scores or segment images using EfficientSAM.
Mask Options: Users have the flexibility to combine masks into a single output image or extract selected mask indices for individual output.

Additional Detection Features

Through collaborations, such as those with ltdrdata, new opportunities for enhanced object detection are available, including the integration of the Impact-Pack, further enriching the capabilities with class-agnostic non-maximum suppression to remove bounding box overlaps effectively.

Installation Guide

To install the ComfyUI-YoloWorld-EfficientSAM project, users can follow these steps:

Via ComfyUI Manager (recommended): This option is currently in development.
Manual Installation:
1. Navigate to the custom_nodes directory.
2. Run the command git clone https://github.com/ZHO-ZHO-ZHO/ComfyUI-YoloWorld-EfficientSAM.
3. Change to the newly created directory and execute pip install -r requirements.txt.
4. Restart ComfyUI for changes to take effect.
5. For model operation, download efficient_sam_s_cpu.jit and efficient_sam_s_gpu.jit files from EfficientSAM and place them in the appropriate directory.

Workflow Implementations

V2.0 Workflows:
- Image Detection + Segmentation: Enhanced for precision and control.
- Video Detection + Segmentation: Enables seamless analysis of video files.
V1.0 Workflows: Not compatible with V2.0 but provided for historical reference, offering similar capabilities for images and videos.

Timeline of Updates

On February 24, 2024, version V2.0 was released, introducing the mask separation and extraction functionalities and expanding both image and video processing capabilities.
Previous updates include the addition of new detection nodes and project inception details.

Support and Contribution

For those interested in supporting or following the project, various contact avenues are available, including email and social media platforms like Bilibili and Twitter. Contributions to the project include a robust implementation referenced from original works such as YOLO-World, EfficientSAM, with acknowledgments to contributors and inspirational sources.

This project stands as a testament to collaborative development, harnessing the power of existing technologies to push the boundaries of object detection and segmentation solutions.