dynalang - Utilizing Language for Predictive Multimodal Modeling

Introduction to Dynalang: Modeling the World with Language

Dynalang is an innovative project that integrates language to predict future outcomes and solve a variety of tasks. The approach of Dynalang is unique as it uses language within a multimodal world model, which allows an artificial agent to predict future scenarios effectively. This project is supported by a detailed paper titled Learning to Model the World with Language, where interested individuals can seek additional information through the project's site or directly from the paper on arXiv.

Getting Started with Dynalang

To begin exploring Dynalang, users need to install the necessary dependencies via a minimal command:

pip install -e .

HomeGrid: A Simulation Environment

The HomeGrid environment is fundamental for training Dynalang's agent with tasks related to language-based predictions. Users can install the environment and execute training scripts for different tasks:

pip install homegrid
sh scripts/run_homegrid.sh homegrid_task EXP_NAME GPU_IDS SEED

Messenger: Interactive Communication Environment

For those interested in an interactive communication setting, Messenger provides a versatile platform. Installation involves a few steps, starting with necessary dependencies and environment setup:

sudo apt-get install \
  libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev \
  libsdl1.2-dev libsmpeg-dev subversion libportmidi-dev ffmpeg \
  libswscale-dev libavformat-dev libavcodec-dev libfreetype6-dev

Following dependencies, users need to clone and install the Messenger environment, and acquire language resources from Google Drive before running training scripts:

git clone https://github.com/ahjwang/messenger-emma 
pip install -e messenger-emma
sh scripts/run_messenger_s1.sh EXP_NAME GPU_IDS SEED

VLN: Visual Language Navigation

VLN offers a comprehensive setup for visual and language navigation tasks. This environment requires a specific setup involving an older version of Habitat simulator. The process is as follows:

conda create -n dynalang-vln python=3.8
pip install "jax[cuda11_cudnn82]==0.4.8" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install torch torchvision
conda env update -f env_vln.yml
conda install -c aihabitat -c conda-forge habitat-sim=0.1.7 headless
git clone https://github.com/jlin816/VLN-CE VLN_CE
git clone https://github.com/jlin816/habitat-lab habitat_lab

Completing the setup involves downloading specific datasets and running the main training script.

LangRoom: Language and Action Synthesis

LangRoom differentiates itself by allowing simultaneous movement and speech actions, relying on a specialized branch. Users can install and run scripts in this unique environment:

git checkout langroom
pip install langroom
sh run_langroom.sh EXP_NAME GPU_IDS SEED

Text Pretraining and Finetuning

For text-based training and further specialization, additional steps involve setting up datasets and running scripts. This involves configuring TinyStories for text input:

pip install datasets
sh scripts/pretrain_text.sh EXP_NAME GPU_IDS SEED roneneldan/TinyStories /PATH/TO/EVAL/REPLAY/EPISODES

To incorporate a pretrained model, users must use specified flags during the training process:

python dynalang/train.py \
  --load_wm_ckpt True \
  --run.from_checkpoint /path/to/pretraining/checkpoint.pkl \
  ...

Training Configuration Tips

Effective training in Dynalang often requires tuning configurations such as batch sizes across multiple GPUs. Users are encouraged to refer to scripts and experiment with devices to optimize the training process.

Acknowledgments and Citation

Dynalang is an evolution of the DreamerV3 project, showcasing continual development in world models. For academic and research purposes, proper citation of the project ensures acknowledgment of the creators' work:

@article{lin2023learning,
         title={Learning to Model the World with Language},
         author={Jessy Lin and Yuqing Du and Olivia Watkins and Danijar Hafner and Pieter Abbeel and Dan Klein and Anca Dragan},
         year={2023},
         eprint={2308.01399},
         archivePrefix={arXiv},
}

Through Dynalang's sophisticated architecture, it seeks to pave new paths in artificial intelligence by leveraging the complexity and capabilities of language. Each component of the project is a step towards understanding and designing intelligent systems that can model complex environments through linguistic frameworks.