graph-learn - Enhancing Large-Scale Graph Neural Networks with Distributed Architecture

Introduction to Graph-Learn

Graph-Learn, formerly known as AliGraph, is a distributed framework specifically created for developing and implementing large-scale graph neural networks (GNNs). It has found multiple successful applications within Alibaba, addressing needs in areas like search recommendations, network security, and the construction of knowledge graphs. Following the release of Graph-Learn 1.0, the framework was expanded to include online inference services, offering a comprehensive solution that encompasses both training and real-time inference for GNNs, thus facilitating their use in actual business scenarios.

Core Components

GraphLearn-Training

GraphLearn-Training serves as the cornerstone of the framework, designed to accommodate batch graph sampling and the training of GNN models, whether offline or incrementally. The framework supports developers through both Python and C++ interfaces for conducting graph sampling operations. Additionally, it includes a gremlin-like Graph Sampling Language (GSL) interface. For those working on GNN model development, Graph-Learn offers an array of development paradigms and processes. It boasts compatibility with both TensorFlow and PyTorch, providing model developers with comprehensive data and model layer interfaces as well as numerous model examples.

For more detailed information, visit the GraphLearn Documentation.

Dynamic-Graph-Service

The Dynamic-Graph-Service component is tailored for online inference, enabling real-time sampling on dynamic graphs that are continuously updated through streams. It guarantees a high-performance level with a sampling latency for the 99th percentile (P99) within 20 milliseconds on large-scale dynamic graphs. On the client side of this service, the framework offers Java GSL interfaces and TensorFlow model prediction capabilities.

More details can be accessed here.

How It Works

Using the GraphLearn-Training and Dynamic-Graph-Service together allows for effective training and inference of GNN models:

A web-based user request is initiated. The client samples data in real-time on the dynamic graph (step 1), using these samples as input for the model and subsequently requesting prediction results from the Model service (step 3).
The prediction results, user feedback, and relevant web context are then sent to the Data Hub (steps 0 and 3), such as a log service.
Data updates are streamed into the Dynamic Graph Service as new graph updates (step 4).
GraphLearn-Training periodically loads windows of graph data, incrementally trains models, and updates these models on the TensorFlow Model service.

Additional Resources

A new GNN acceleration library for PyTorch is now available to further enhance operations, which can be explored at the GitHub repository: https://github.com/alibaba/graphlearn-for-pytorch.

Citation

Researchers utilizing Graph-Learn in their studies can acknowledge its impact by citing the following paper:

@article{zhu2019aligraph,
  title={AliGraph: a comprehensive graph neural network platform},
  author={Zhu, Rong and Zhao, Kun and Yang, Hongxia and Lin, Wei and Zhou, Chang and Ai, Baole and Li, Yong and Zhou, Jingren},
  journal={Proceedings of the VLDB Endowment},
  volume={12},
  number={12},
  pages={2094--2105},
  year={2019},
  publisher={VLDB Endowment}
}

Licensing

Graph-Learn is licensed under the Apache License 2.0, permitting a wide range of uses while ensuring attributions to the original developers.