DeepSeek-LLM - Advanced Language Model with Proficiency in Coding, Mathematics, and Chinese Language

DeepSeek-LLM: Unveiling the Power of Language Understanding

Introduction

DeepSeek-LLM marks a significant milestone in the realm of artificial intelligence with its advanced language model, boasting an impressive 67 billion parameters. This model has been meticulously trained from scratch on a vast dataset comprising 2 trillion tokens in both English and Chinese. To promote further research and innovation, DeepSeek-LLM offers open access to its 7B/67B Base and Chat models to the research community.

Key Features and Capabilities

Superior General Capabilities: DeepSeek-LLM 67B Base excels beyond its peers, such as Llama2 70B Base, in key areas including reasoning, coding, mathematics, and Chinese language comprehension.
Proficiency in Coding and Mathematics: The DeepSeek-LLM 67B Chat model showcases outstanding performance in coding, achieving a HumanEval Pass@1 score of 73.78, and mathematics, with scores of 84.1 in GSM8K 0-shot and 32.6 in Math 0-shot. It demonstrated its extensive generalization abilities with an exceptional score of 65 on the Hungarian National High School Exam.
Chinese Language Mastery: Evaluations reveal that DeepSeek-LLM 67B Chat surpasses GPT-3.5 in comprehending and processing the Chinese language.

Availability and Model Downloads

DeepSeek-LLM has made the 7B/67B models, inclusive of both base and chat versions, publicly accessible. This openness supports a diverse array of research endeavors within both academic and commercial sectors. The model also provides intermediate checkpoints from its training process, ensuring ample opportunities for developers and researchers to explore its capabilities.

Evaluation Results

Base Model Evaluation

DeepSeek-LLM has undergone rigorous testing across numerous benchmarks. In categories like HellaSwag, TriviaQA, MMLU, and more, the 67B Base model stood out, showing marked improvements over other models.

Chat Model Evaluation

Unique Exam Performance: DeepSeek-LLM’s 67B Chat model was tested using newly crafted problem sets and delivered exceptional results, particularly on the Hungarian National High School Exam.
Instruction Following Evaluation: Google’s instruction following evaluation dataset further validated the model’s capabilities, demonstrating its proficiency in adhering to provided instructions.
LeetCode Contest Evaluation: The model showed strong coding abilities by excelling in problems sourced from the LeetCode Weekly Contest. The model’s performance was benchmarked against a variety of test cases.

Pre-Training Details

DeepSeek-LLM models adopt a similar architecture to LLaMA, utilizing an autoregressive transformer decoder. They are trained on a voluminous 2 trillion tokens, with methodologies ensuring data richness and variety. The pre-training incorporates various sources, including Internet texts, math problems, code, and curated datasets, with a focus on privacy and copyright considerations.

Quick Start Guide

Installation and Inference

With a Python 3.8+ environment, users can easily install necessary dependencies and employ Huggingface's Transformers for model inference. The guide provides detailed instructions for both text and chat completion using the DeepSeek models.

Alternative Inference via vLLM

For those seeking high-throughput inference, vLLM offers an efficient alternative, ensuring seamless execution of the DeepSeek models for diverse applications.

Conclusion

DeepSeek-LLM is uniquely positioned to redefine language processing with its remarkable capabilities and openness for research exploration. Its powerful performance in both language understanding and general capabilities makes it an invaluable tool for advancing AI research and application.