Introduction to GFlowNet Project
GFlowNet, short for Generative Flow Network, is an innovative framework designed for generative modeling, particularly focusing on generating discrete, combinatorial objects. This framework is prominently used for graph generation within the project, making it a powerful tool for constructing entities such as molecular structures.
Concept Behind GFlowNet
The core concept of GFlowNet revolves around estimating flows within a directed acyclic network in a graph-theoretic context. This network outlines all possible methods of constructing objects, and understanding the flow through it provides a policy that can be followed to sequentially build these objects. When an object is being constructed step by step, each partial form of the object is known as a trajectory.
Interestingly, the term "network" within GFlowNet refers to the state space involved in object construction, rather than a neural network architecture, which is a common understanding elsewhere in machine learning.
Focus of the GFlowNet Library
While GFlowNet can be applied to various scenarios, its primary focus is on constructing graphs incrementally, node by node. This is particularly relevant when building graphs depicting molecular compounds, where each node may represent an atom. The library employs a Graph Neural Network (GNN), which predicts policies by outputting logits for each node, such as whether to add an atom or bond, along with logits for the entire graph, like determining when the construction of the object is complete.
GFlowNet Algorithms
The library supports numerous GFlowNet algorithms and accommodates both offline training, using existing data, and online training, generating data in real-time by querying the model to obtain construction trajectories.
Installation
GFlowNet can be installed via PIP, a common package manager for Python. Given its reliance on wheels from the torch-geometric package, it's important to specify these during installation. Depending on whether the model is being run on CPU or GPU, different links need to be used. Additionally, there are paths provided to install specific versions or dependencies if required.
Getting Started
Beginners can explore the library with the sEH fragment-based Multi-Objective Optimization task, which comes ready to run but allows for configurable changes. For an introductory walkthrough, the "Getting Started" documentation is recommended, while those seeking comprehensive insights can delve into the "Implementation Notes."
Repository Structure
- Algorithms: Implementation of GFlowNet algorithms and baselines, focusing on sampling trajectories from models and computing losses.
- Data: Handles dataset definitions, loading, and sampling utilities.
- Environments: Contains classes for different graph construction environments, connecting graphs to objects like molecules.
- Examples: Offers simple implementations of GFlowNet for easy reference.
- Models and Tasks: Defines model structures and training tasks, including molecule samplers and molecule design targets.
- Utilities: Provides additional tools for multiprocessing, metrics, and conditioning.
- Training: Contains modules like
trainer.py
andonline_trainer.py
for managing training processes.
Development and Contribution
Community contributions to the GFlowNet project are welcomed. Developers can install necessary dependencies alongside developer-specific tools to lint code and run tests using tox
. For comprehensive guidance on contributing, the Contributing documentation provides detailed insights.
This comprehensive approach enables GFlowNet to serve as a versatile and powerful tool in the field of generative modeling, offering significant potential for advancements in molecular graph generation and beyond.