MG-LLaVA
The MG-LLaVA project presents a multi-level granularity model framework for enhanced visual instruction tuning, integrating resolutions and object-centric features for improved processing. It utilizes publicly available data, showcasing superior perception capabilities, and has released new inference code along with expanded evaluation benchmarks. Comprehensive setup and training guidance is included.