Project Icon

Mantis

Advanced Multi-Image Processing with LLaMA-3 Technology

Product DescriptionMantis enhances multi-image visual language task processing using LLaMA-3, allowing for efficient handling of text and image inputs simultaneously. By achieving top performance on five key benchmarks with minimal resources, Mantis stands out without extensive pre-training data. It ensures robust single-image capabilities similar to CogVLM and Emu2. Recent updates include Idefics-3 training support and evaluation tools via VLMEvalKit. Explore a variety of models and scripts available on Hugging Face.
Project Details