ChatGLM-6B - Bilingual Conversational AI Optimized for Smooth Deployment

ChatGLM-6B: A Comprehensive Introduction

Overview

ChatGLM-6B is a bilingual open-source conversational language model that supports both Chinese and English. It is built on the General Language Model (GLM) architecture and is equipped with 6.2 billion parameters. Thanks to model quantization technology, it allows users to deploy the model locally on consumer-grade graphics cards with minimal memory requirements—only 6GB of VRAM when quantized to INT4.

This model leverages technology similar to ChatGPT but is specifically optimized for Chinese dialogue and question-answering tasks. Through extensive training involving approximately 1 trillion tokens in both languages and enhanced with techniques such as supervised fine-tuning, self-help feedback, and reinforcement learning with human feedback, ChatGLM-6B generates responses that match human preferences quite well.

Key Features

Accessible Deployment: Local deployment on low-memory devices is possible, with INT4 quantization requiring just 6GB of VRAM.
Efficient Fine-Tuning: Utilizing the P-Tuning v2 method allows developers to efficiently fine-tune the model for specific application scenarios, needing only 7GB of VRAM at the INT4 quantization level.
Open for Research and Commercial Use: Weights of ChatGLM-6B are fully open for academic research. After filling out a survey for registration, free commercial use is also permitted.

Latest Developments

GLM-4 Model and API: The latest GLM-4 model series offers significant improvements across many metrics. Developers can access the open-source versions or experience them via specific platforms and APIs to leverage the new capabilities like System Prompt, Function Call, Retrieval, and Web Search.

Upgraded Models:

ChatGLM2-6B: An upgraded version of the original model, offering enhanced performance, extended context length up to 32K sequences, and improved inference efficiency.
CodeGeeX2: A code generation model based on ChatGLM2-6B, trained with 600B code data to significantly boost its coding capabilities across multiple programming languages.
VisualGLM-6B: A multi-modal dialogue model that supports image understanding.

Community and Collaboration

ChatGLM-6B encourages collaboration with the open-source community to advance large model technology. However, it is important that developers ensure compliance with the open-source license, avoiding any harmful applications. The ChatGLM-6B project itself has not been developed into any applications like web, Android, iOS, or Windows apps by the project team.

Usage Guidelines

Hardware Requirements

Floating Point 16 (FP16): Requires a minimum of 13GB VRAM.
INT8 Quantization: Minimally needs 8GB VRAM.
INT4 Quantization: Operates with as low as 6GB VRAM for inference.

Installation Instructions

To set up the environment, install dependencies via pip using the command: pip install -r requirements.txt. For optimal performance, the recommended version for the transformers library is 4.27.1. Models can also run on CPUs if gcc and openmp are installed, with specific versions tested for compatibility on different operating systems.

Code Execution

The ChatGLM-6B model can be instantiated and used to generate conversations using a simple Python code snippet:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

Here, the model demonstrates its capability by generating a conversation, illustrating its proficiency in interaction and response formulation.

Conclusion

ChatGLM-6B is a significant stride in conversational AI, offering robust support for bilingual interactions with extensive options for customization and deployment. By engaging with the open-source community, the model continues to evolve, laying down the groundwork for future advancements in dialogue systems and applications.