Project Icon

SAN

Enhancing Semantic Segmentation via Side Adapter Network

Product DescriptionThis project presents SAN, a framework that employs a pre-trained vision-language model for open-vocabulary semantic segmentation by treating it as a region recognition task. It leverages a side network attached to the CLIP model to handle mask proposals and attention biases, ensuring efficient and accurate segmentation with minimal additional parameters. SAN is validated on standard benchmarks, showing improved performance with fewer parameters and faster inference. The design ensures compatibility with existing CLIP features, supporting end-to-end training for adaptability in semantic tasks without sacrificing precision.
Project Details