VisualGLM-6B
VisualGLM-6B is a multi-modal dialog language model supporting images, Chinese, and English, based on ChatGLM-6B with 7.8 billion parameters including visual capabilities from BLIP2-Qformer. The model achieves visual-linguistic interoperability and can be deployed on consumer GPUs by using quantized accuracy. It is pre-trained on 330 million captioned images, optimizing alignment across languages while adhering to open-source protocols. Limitations include image specificity and potential model hallucinations, with plans for future improvements.