life2vec - Leverage Life-event Sequences to Model Human Lives with Predictive Algorithms

Life2Vec: Predicting Human Lives Through Event Sequencing

The Life2Vec project is an innovative approach to understanding and predicting human lives by analyzing sequences of life events. This project is based on a scientific paper titled "Using Sequences of Life-events to Predict Human Lives", which was published in Nature Computational Science. The project does not maintain any social media presence but shares all relevant information through its dedicated webpage, life2vec.dk.

Basic Implementation of Life2Vec

The core objective of Life2Vec is to create a comprehensive model that can predict life outcomes by examining various life events. To achieve this, the project publishes various components of the Life2Vec model in separate repositories:

Basic Implementation: A simplified version of the model, called life2vec-light, allows users to experiment with pretraining using dummy data.
Class Distance Weighted Cross-Entropy Loss: This novel loss function, useful for predicting personality traits, is available for exploration in the cdw-cross-entropy-loss repository.

Source Code

The project provides a rich collection of scripts and notebooks designed for data processing, Life2Vec model training, statistical analysis, and visualizations. These resources are hosted in a GitHub repository, with sensitive paths removed for privacy and compliance with Statistics Denmark's Research Scheme.

Overall Structure

The Life2Vec project employs the Hydra framework to facilitate its experiments. The configuration files located in the /conf directory manage these experiments:

Experiment Configuration: Includes YAML files for pretraining and finetuning processes.
Task Specifications: Details data augmentation strategies for various tasks.
Trainer Configuration: Though not used, it offers options for logging and multithread training.
Data Handling: Contains instructions for loading and processing data.

Further, the project structure includes directories for analysis, embedding evaluation, visualization, and optimization tasks, providing a comprehensive toolkit for understanding model outputs.

Source Code Details

In the source folder (/src), the project organizes the model code and data manipulation scripts:

Data Processing: Scripts to preprocess and load data into machine learning frameworks like PyTorch.
Model Implementations: Baseline models and specific implementations for tasks such as Mortality and Emigration Prediction.
Life2Vec Model Details: The /src/transformer folder hosts custom modules, loss functions, metric computations, and detailed model implementations.

Important scripts for executing various project stages include train.py, test.py, tune.py, and val.py.

Running the Scripts

To use the Life2Vec code, users can execute a series of commands that facilitate pretraining, dataset assembly, and task-specific finetuning using Hydra-managed scripts. These commands cater to various machine learning tasks, from hyperparameter tuning to emigration prediction.

Contributing to the Project

Aside from the main contributors, other collaborators like Søren Mørk Hartmann have also contributed to the project, ensuring its robust development and maintenance.

Citation

For those interested in citing Life2Vec in their work, it is published in top-tier scientific journals and listed with preprints on arXiv. The project also provides specific BibTeX entries for scientific referencing.

Life2Vec represents a cutting-edge approach to understanding human trajectories through computational science, paving the way for predictive modeling based on life events—a promising domain with vast implications for research and practical applications alike.