Project Icon


Multimodal Learning for Accurate Referring and Grounding

Product DescriptionFerret is a comprehensive Multimodal Large Language Model (MLLM) designed for various referring and grounding tasks. Innovations include Hybrid Region Representation and Spatial-aware Visual Sampler. It comes with the GRIT Dataset for robust instruction tuning and the Ferret-Bench evaluation benchmark. Components like Ferret-UI and model checkpoints illustrate its proficiency in handling complex tasks, serving the research community within the scope of licensing agreements.
Project Details