Life2Vec: Predicting Human Lives Through Event Sequencing
The Life2Vec project is an innovative approach to understanding and predicting human lives by analyzing sequences of life events. This project is based on a scientific paper titled "Using Sequences of Life-events to Predict Human Lives", which was published in Nature Computational Science. The project does not maintain any social media presence but shares all relevant information through its dedicated webpage, life2vec.dk.
Basic Implementation of Life2Vec
The core objective of Life2Vec is to create a comprehensive model that can predict life outcomes by examining various life events. To achieve this, the project publishes various components of the Life2Vec model in separate repositories:
- Basic Implementation: A simplified version of the model, called life2vec-light, allows users to experiment with pretraining using dummy data.
- Class Distance Weighted Cross-Entropy Loss: This novel loss function, useful for predicting personality traits, is available for exploration in the cdw-cross-entropy-loss repository.
Source Code
The project provides a rich collection of scripts and notebooks designed for data processing, Life2Vec model training, statistical analysis, and visualizations. These resources are hosted in a GitHub repository, with sensitive paths removed for privacy and compliance with Statistics Denmark's Research Scheme.
Overall Structure
The Life2Vec project employs the Hydra framework to facilitate its experiments. The configuration files located in the /conf
directory manage these experiments:
- Experiment Configuration: Includes YAML files for pretraining and finetuning processes.
- Task Specifications: Details data augmentation strategies for various tasks.
- Trainer Configuration: Though not used, it offers options for logging and multithread training.
- Data Handling: Contains instructions for loading and processing data.
Further, the project structure includes directories for analysis, embedding evaluation, visualization, and optimization tasks, providing a comprehensive toolkit for understanding model outputs.
Source Code Details
In the source folder (/src
), the project organizes the model code and data manipulation scripts:
- Data Processing: Scripts to preprocess and load data into machine learning frameworks like PyTorch.
- Model Implementations: Baseline models and specific implementations for tasks such as Mortality and Emigration Prediction.
- Life2Vec Model Details: The
/src/transformer
folder hosts custom modules, loss functions, metric computations, and detailed model implementations.
Important scripts for executing various project stages include train.py
, test.py
, tune.py
, and val.py
.
Running the Scripts
To use the Life2Vec code, users can execute a series of commands that facilitate pretraining, dataset assembly, and task-specific finetuning using Hydra-managed scripts. These commands cater to various machine learning tasks, from hyperparameter tuning to emigration prediction.
Contributing to the Project
Aside from the main contributors, other collaborators like Søren Mørk Hartmann have also contributed to the project, ensuring its robust development and maintenance.
Citation
For those interested in citing Life2Vec in their work, it is published in top-tier scientific journals and listed with preprints on arXiv. The project also provides specific BibTeX entries for scientific referencing.
Life2Vec represents a cutting-edge approach to understanding human trajectories through computational science, paving the way for predictive modeling based on life events—a promising domain with vast implications for research and practical applications alike.