WebGLM - Optimized Web-enhanced QA with a 10B-Parameter GLM

WebGLM: Advancing Web-Enhanced Question Answering with Human Preferences

Overview

WebGLM is a pioneering project designed to create an efficient web-enhanced question-answering system leveraging the vast power of the General Language Model (GLM), which comes with 10 billion parameters. The project aims to seamlessly integrate web search and data retrieval capabilities into its pre-trained language model to optimize real-world application deployments by providing accurate and context-aware answers.

Features

LLM-augmented Retriever:
- This feature enhances the system's ability to fetch web content that is relevant to the user's query, thereby improving the accuracy of the answers provided.
Bootstrapped Generator:
- Utilizing the advanced capabilities of the GLM, this component is responsible for generating human-like responses to inquiries, working to ensure answers are both accurate and user-friendly.
Human Preference-aware Scorer:
- This scoring system evaluates the generated responses, prioritizing human preferences to make sure that the answers are not only accurate but also engaging and useful to users.

Recent Updates

June 25, 2023: The introduction of ChatGLM2-6B, an upgraded version of the existing ChatGLM-6B, which has demonstrated significant improvements in various performance metrics such as MMLU, CEval, GSM8K, and BBH, showcasing its competitiveness.
Longer Context: The system can now process longer conversation contexts using the FlashAttention technology, extending from 2K to 32K tokens, thereby allowing for more in-depth dialogues.
More Efficient Inference: Thanks to Multi-Query Attention, faster inference speeds and reduced GPU memory usage have been achieved, getting closer to real-time response speeds.

Preparing the Environment

To set up the WebGLM system, you need to clone the repository and install necessary Python and Node.js dependencies.
Acquire a SerpAPI key for search capabilities or alternatively use Bing search for local browser environments.
Additionally, retrieve the necessary model checkpoints to enable the system to execute queries effectively.

How to Use WebGLM

The WebGLM system can be run as a Command Line Interface (CLI) or as a Web Service. This allows for flexible integration into various applications or user interfaces. For instance, CLI mode can be engaged using the WebGLM-2B or WebGLM-10B models, with optional Bing search functionality for enhanced web data retrieval.

Training and Evaluation

For those interested in developing their models, WebGLM provides procedures for training both its generator and retriever components. Users can download training data and tools to fine-tune the model's retrieval and response generation capabilities. Comprehensive evaluation scripts are also provided to test system effectiveness on datasets like TriviaQA.

Real Application Scenarios

WebGLM is versatile and applicable in various real-world scenarios, from providing insights into ongoing global issues like COVID-19 to offering advice on balancing work and personal interests. Its ability to deliver informed answers makes it a valuable tool across numerous fields, including technology and lifestyle.

WebGLM represents a significant step forward in creating sophisticated question-answering systems that are not only driven by advanced AI capabilities but are also finely tuned to human preferences, ensuring relevancy and satisfaction in the answers provided.