dbrx - DBRX: A Scalable Open-Source Model by Databricks with Expert Efficiency

Introduction to DBRX

DBRX is a sophisticated language model developed by Databricks. It is released under an open license, allowing broad access and customization. This project repository includes the essential code and tutorials necessary to run DBRX, along with a variety of resources and links to facilitate its use.

Model Overview

DBRX employs a novel architecture known as Mixture-of-Experts (MoE), featuring an impressive 132 billion total parameters, 36 billion of which are active during use. The model operates using 16 experts, with four actively used during any given training or inference task. DBRX has been extensively pre-trained using a corpus containing 12 trillion tokens, and it supports a context length of up to 32K tokens.

Two prominent versions of the model are openly available:

DBRX Base: This is the foundational pre-trained model.
DBRX Instruct: A refined version tailored for instruction-following capabilities.

Deployment and Usage

To get started with DBRX, users need to download the necessary model weights and tokenizer from the DBRX page on Hugging Face, ensuring they accept the license agreements. It's recommended to have substantial memory resources, at least 320GB, to efficiently run the model. Installation and setup involve:

pip install -r requirements.txt   # For base setup
pip install -r requirements-gpu.txt   # For GPU support with flash attention
huggingface-cli login   # Authenticate to access models with a Hugging Face token
python generate.py   # Customize prompts and settings as needed

For those facing installation challenges, utilizing a Docker image such as mosaicml/llm-foundry:2.2.1_cu121_flash2-latest is recommended.

Advanced Inference Options

DBRX supports optimized inference through both TensorRT-LLM and vLLM. Running these requires advanced hardware, such as a multi-GPU setup. Additionally, those with Apple M-series chips can leverage the MLX platform for quantized model use.

Further instructions to utilize these tools include:

Pending support from TensorRT-LLM: Awaiting completion of DBRX integration.
The vLLM library provides thorough documentation for setup.
MLX allows DBRX deployment on Mac notebooks with potent M-series chips.
The llama.cpp framework can execute a DBRX model on high RAM systems.

Finetuning DBRX

Databricks facilitates a user-friendly way to finetune DBRX models through its open-source library, LLM Foundry. Two primary finetuning methods are available:

Full Parameter Finetuning: Offers maximum flexibility and model customizability.
LoRA (Low-Rank Adaptation) Finetuning: Allows parameter-efficient training, albeit with current limitations regarding expert components.

Integration and Support

DBRX is well-integrated within Databricks' ecosystem, alongside third-party platforms like You.com and Perplexity Labs. These integrations enable seamless deployment and experimentation for users.

Technical discussions and troubleshooting can be conducted via the Hugging Face community forums or the relevant training libraries on GitHub. For professional engagements, such as pre-training or finetuning services and consultation, Databricks provides direct support through their contact portal.

Licensing

The DBRX model and accompanying code are licensed under the Databricks Open Source License. This ensures accessibility for both research and commercial applications, with the details outlined in the Databricks license agreement.

DBRX represents a significant leap in large language model development, offering flexibility, power, and extensive community and commercial support.