Project Introduction: RFDiffusion All Atom (RFDiffusion AA)
RFDiffusion All Atom (RFDiffusion AA) is a project developed to facilitate the design of small molecule binders using advanced computational techniques. This sophisticated tool leverages machine learning models to design protein structures capable of binding specific ligands or biomolecules.
Setup and Installation
To get started with RFDiffusion AA, one needs to perform a few setup steps:
-
Clone the Repository: The first step involves cloning the package from GitHub. This can be accomplished via the command line with:
git clone https://github.com/baker-laboratory/rf_diffusion_all_atom.git cd rf_diffusion_all_atom
-
Download the Required Container: A container file necessary for running RFDiffusion AA should be downloaded using the command:
wget http://files.ipd.uw.edu/pub/RF-All-Atom/containers/rf_se3_diffusion.sif
-
Obtain the Model Weights: The project's model weights, which are essential for running the software, can be retrieved with:
wget http://files.ipd.uw.edu/pub/RF-All-Atom/weights/RFDiffusionAA_paper_weights.pt
-
Initialize Git Submodules: To ensure that all components of the package are correctly set up, initialize the git submodules with:
git submodule init git submodule update
-
Install Apptainer: If Apptainer is not already installed, it is necessary to do so. Apptainer is a platform for running containers without having to install Python packages directly. The installation guide can be found at Apptainer's official site.
Inference
Designing Small Molecule Binders
RFDiffusion AA excels in generating binders for small molecules by designing corresponding protein structures. For example, to create a binder for the ligand OQO from PDB file 7v11, one would use the following command:
/usr/bin/apptainer run --nv rf_se3_diffusion.sif -u run_inference.py inference.deterministic=True diffuser.T=100 inference.output_prefix=output/ligand_only/sample inference.input_pdb=input/7v11.pdb contigmap.contigs=['150-150'] inference.ligand=OQO inference.num_designs=1 inference.design_startnum=0
Key Parameters Explained:
inference.deterministic=True
: Enables reproducible results by seeding random number generators.inference.num_designs=1
: Specifies that a single design will be created.contigmap.contigs=['150-150']
: Determines the length of the protein to be generated.diffuser.T=100
: Indicates the number of denoising steps.
Output Files:
- The main output is the design PDB file:
output/ligand_only/sample_0.pdb
. - Additionally, intermediate denoised structures and predictions made by the network are saved.
Designing with Protein Motifs
For cases where specific protein motifs are also critical, RFDiffusion AA can accommodate such requirements. For instance, to design a binder with a protein motif for ligand CYC using a PDB file, the following command is used:
/usr/bin/apptainer run --nv rf_se3_diffusion.sif -u run_inference.py inference.deterministic=True diffuser.T=200 inference.output_prefix=output/ligand_protein_motif/sample inference.input_pdb=input/1haz.pdb contigmap.contigs=['10-120,A84-87,10-120'] contigmap.length="150-150" inference.ligand=CYC inference.num_designs=1 inference.design_startnum=0
This comprehensive pipeline integrates with tools like proteinMPNN, AlphaFold2, LigandMPNN, and PyRosetta, further refining the output for specific applications, such as heme-binding proteins.
Conclusion
RFDiffusion AA is an innovative and powerful tool designed for scientists and researchers in the field of computational protein design. By facilitating the creation of specific protein-ligand complexes, it supports breakthroughs in various biological and medical research areas.