Project Introduction: Transformers in Single-Cell Omics
The "Transformers in Single-Cell Omics" project is an innovative endeavor that collates and evaluates various transformer models applied to single-cell omics data. Omics technologies allow for the comprehensive analysis of biological molecules that make up the structure, function, and dynamics of an organism. In particular, single-cell omics focuses on analyzing these molecules at the individual cell level, which can provide more detailed insights than traditional bulk analysis.
Purpose of the Repository
This repository serves as a companion to the review paper titled Transformers in Single-Cell Omics: A Review and New Perspectives. It provides a curated list of transformer models specifically tailored and evaluated for single-cell omics data, excluding those that are predominantly for bulk data, images, or sequence data like DNA and proteins. The aim is to highlight approaches that leverage transformer architectures in understanding complex biological data on a cellular level.
How to Contribute
The repository invites contributions from the community. Researchers and developers interested in adding or editing entries related to the models can open a pull request or issue on the platform. This collaborative approach ensures that the repository remains current and comprehensive, reflecting the latest advancements in the field.
Key Components and Models
The repository includes an extensive list of transformer models, complete with references to their respective research papers, source code, and specific applications in omic modalities like scRNA-seq, DNAm, and proteomics, among others. Below are a few key models highlighted in the repository:
-
Precious3GPT: Utilizes a decoder-only transformer model to emulate chemical responses and clinical conditions, with applications in age prediction and gene classification.
-
LangCell: Incorporates dual encoders to handle scRNA-seq and natural language data, focusing on cell type annotation and novel cell type identification through contrastive learning.
-
ScRAT: Employs an encoder model for phenotype prediction by aggregating cell embeddings, particularly useful in health condition labeling.
-
Geneformer: Features an encoder model with tasks like gene function prediction and cell annotation, using a large cross-tissue dataset for pre-training.
-
scGPT: Integrates multiple omics data forms (scRNA-seq, scATAC-seq, CITE-seq) with transformer architectures to predict genetic perturbations and annotate cell types.
Applications and Future Directions
These transformer models, applied in the context of single-cell omics, open new avenues for understanding cellular behavior with high precision. By leveraging self-supervised learning tasks, such models can advance fields like drug discovery, disease research, and personalized medicine.
The repository not only serves as a resource for those interested in the intersection of machine learning and biology but also as a platform for advancing transformer technologies in new and existing single-cell omics applications. As the field evolves, the repository will likely expand to include more models and findings, fostering further improvements in bioinformatics and computational biology.
Overall, the "Transformers in Single-Cell Omics" project exemplifies cutting-edge collaboration at the intersection of data science and genomics, paving the way for unparalleled discoveries in cellular biology.