bigscience - Investigating Large Language Models through Collaborative Training and Experimentation

BigScience Project Overview

The BigScience project is a large-scale research initiative aimed at advancing the field of language models. This endeavor focuses on the development and exploration of sophisticated large language models through collaborative research and practical experimentation. Below is a comprehensive overview of its key components and activities.

Project Repositories

BigScience operates with two primary code repositories that drive the project's activities:

Megatron-DeepSpeed: This is the main repository and serves as the flagship code base used by the project.
BigScience Main Repository: This repository encompasses everything else related to the project, including documentation, experiments, and other essential resources.

Active Segments of the Repository

The project's main repository is bustling with activity in several key areas:

JZ: This section houses a plethora of information about the project’s working environment, aiding in evaluating, planning, and executing various tasks efficiently.
Experiments: Here, multiple experiments are actively conducted. The segment includes comprehensive documentation, result tables, scripts, and logs to facilitate better understanding and progress tracking.
Datasets Info: This part provides information on the datasets utilized within the project.
Train: It contains crucial details regarding ongoing training sessions and is a hub for examining formalized efforts in training models.

Additionally, README files are available for specific aspects of the project, such as integrating with third-party tools like hubs.

Training Initiatives

Training is a critical aspect of BigScience, with detailed chronicles maintained for experiments and findings. Key training exercises include:

Train 1

Model: 13B - Unmodified Megatron GPT-2
This serves as the baseline model.
Accessible materials include the full spec, training scripts, and synched logs that users can monitor live.

Train 3

Focused on architecture and scaling using the GPT-2 model with various sizes and warmup configurations. It includes baseline runs without intricate methods. Available tensorboard links allow for precise tracking.

Train 8

Model: 104B with extra-wide hidden size
This aims to address training instabilities.
Similar to other training sessions, it provides a full range of documentation, scripts, and live log monitoring tools.

Train 11

Representing the current main training initiative, this exercise focuses on 176B-ML, the most advanced model explored in the project. Comprehensive materials such as specifications, scripts, chronicles, and easy access to ongoing logs ensure that it remains at the forefront of the research activities.

Conclusion

The BigScience project is a cutting-edge research initiative that combines innovation, collaboration, and extensive testing to push the boundaries of language model capabilities. With comprehensive resources and active community engagement, the project not only furthers academic understanding but also contributes significantly to practical advancements in the field of large-scale language models.