EVF-SAM
Enhanced image prediction accuracy through early vision-language fusion for text-prompted segmentation tasks. This model excels in Referring Expression Segmentation by enabling fast computations on T4 GPUs. Utilizing simple image training on RES datasets, it extends its capabilities to zero-shot video prediction with innovations from SAM-2. It supports multitasking for semantic and body part segmentation, serving as a versatile tool for academic and practical use. Experience its capabilities through available demos, code, and weights.