#Vision Language Models

Logo of mlx-vlm
mlx-vlm
MLX-VLM provides tools to perform inference and fine-tune vision-language models on macOS. It supports efficient interaction through a command-line interface and Gradio chat UI, and is compatible with models like Idefics 2 and Phi3-Vision. With features like multi-image chat support and model enhancement using LoRA and QLoRA, MLX-VLM facilitates comprehensive image analysis. Installation is straightforward via pip.
Logo of MGM
MGM
Discover the innovative dual-encoder framework designed for large language models ranging from 2B to 34B, specialized in image comprehension and generation. This open-source project, built upon LLaVA, provides detailed resources for training, setup, and assessment. Engage with advanced vision-language integration via its demos and vast datasets such as COCO and GQA, available on Hugging Face Spaces. Follow recent model developments and performance evaluations.