CogCoM
CogCoM is an advanced vision-language model that uses a methodical manipulation chain to tackle complex visual challenges. With six distinct manipulations and a robust data generation process, CogCoM generates precise training data for a wide range of tasks, including chat, captioning, grounding, and reasoning. The model supports both web-based and CLI interfaces, allowing for versatile and parallelized deployment suited for varied multimodal use cases.