alpa - Distributed System for Efficient Neural Network Scaling with Alpa

Project Introduction: Alpa

Alpa is a comprehensive system designed to facilitate the training and serving of large-scale neural networks. In recent years, the capabilities of neural networks have expanded dramatically, reaching up to hundreds of billions of parameters, which have significantly propelled advancements such as GPT-3. Despite these achievements, managing and utilizing these complex neural networks require sophisticated distributed system techniques. Alpa addresses these needs by automating the processes involved in large-scale distributed training and serving, simplifying them to just a few lines of code.

Key Features

Alpa stands out with its impressive core features, designed to enhance efficiency and integration:

Automatic Parallelization: Alpa's system automatically transforms single-device codes into distributed cluster applications. It employs data parallelism, operator parallelism, and pipeline parallelism to achieve this transformation seamlessly.
Excellent Performance: The system is built to scale linearly, making it possible to train models with billions of parameters across distributed clusters with ease and efficiency.
Integration with Machine Learning Ecosystem: Alpa is constructed on top of high-performance and open-source libraries like Jax, XLA, and Ray, ensuring both reliability and compatibility within the broader machine-learning environment.

Serving Large Models

Incorporating Alpa with the Huggingface/Transformers interface aids in executing large model inference effectively. Below is a sample script that showcases this integration. Detailed documentation is available here.

from transformers import AutoTokenizer
from llm_serving.model.wrapper import get_model

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-2.7b")
tokenizer.add_bos_token = False

# Load the model. Alpa automatically downloads the weights to the specified path
model = get_model(model_name="alpa/opt-2.7b", path="~/opt_weights/")

# Generate
prompt = "Paris is the capital city of"

input_ids = tokenizer(prompt, return_tensors="pt").input_ids
output = model.generate(input_ids=input_ids, max_length=256, do_sample=True)
generated_string = tokenizer.batch_decode(output, skip_special_tokens=True)

print(generated_string)

Training with Alpa

With Alpa, maximizing the potential of single-device training codes on distributed clusters is straightforward. By using Alpa's @parallelize decorator, developers can effortlessly scale-up training operations. A practical example is shared below, with further tutorials and installation instructions found on their documentation site.

import alpa

# Parallelize the training step in Jax by simply using a decorator
@alpa.parallelize
def train_step(model_state, batch):
    def loss_func(params):
        out = model_state.forward(params, batch["x"])
        return jnp.mean((out - batch["y"]) ** 2)

    grads = grad(loss_func)(model_state.params)
    new_model_state = model_state.apply_gradient(grads)
    return new_model_state

# The training loop now automatically runs on your designated cluster
model_state = create_train_state()
for batch in data_loader:
    model_state = train_step(model_state, batch)

Learning Resources

For those interested in delving deeper into Alpa, there are numerous resources available including Google AI blog, presentation slides from OSDI 2022, and a detailed GTC 2023 talk video.

Community and Contribution

The community around Alpa is vibrant and welcoming. Interested developers can connect through the Alpa Slack channel and can contribute by following the contributor guide.

Alpa is distributed under the Apache-2.0 license. While the project is currently not actively maintained, with its core algorithm integrated into the ongoing XLA project, it remains a valuable resource for research and development purposes.