torchrec - Refined sparsity and parallelism tools for optimizing large-scale recommender systems

Introduction to TorchRec

TorchRec is a dedicated domain library in PyTorch, designed to handle the complex demands of large-scale recommendation systems, commonly known as RecSys. Its primary focus is to provide tools that support sparsity and parallelism, crucial for handling models with vast amounts of data. Notably, TorchRec is pivotal in many of Meta's production recommendation models.

External Presence

TorchRec has seen widespread use across various initiatives to boost recommendation systems:

Meta's DLRM: The latest iterations of Meta's Deep Learning Recommendation Model utilize TorchRec to enhance performance.
Disaggregated Multi-Tower: Research on large-scale recommendation techniques, detailed in a paper on topology-aware modeling, leverages TorchRec.
The Algorithm ML by Twitter: TorchRec plays a role in Twitter's machine learning projects focused on recommendation systems.
Databricks Training: Databricks employs TorchRec for training recommendation models efficiently.
Research: Efforts towards creating models that handle 100TB of data have referenced TorchRec for its robust capabilities.

TorchRec Features

Parallelism Primitives: These allow developers to create extensive, efficient models distributed over multiple devices, mixing data and model parallelism.
Sharding Techniques: TorchRec enables sharding of embedding tables across various strategies such as data-parallel, table-wise, row-wise, and more.
Sharding Planner: This tool aids in generating optimized plans for model sharding, ensuring efficient resource use.
Pipelined Training: Overlaps tasks like data loading, device transfer, communication, and computation to boost training speed.
Optimized Kernels: RecSys kernels optimized through FBGEMM enhance the performance of recommendation models.
Quantization Support: Supports reduced precision for both training and inference, optimizing models for C++ inference.
Common Modules & Datasets: Offers frequently used modules and access to popular datasets, such as the criteo click logs and movielens, facilitating model training.
Example Models: Provides end-to-end training examples, including the dlrm event prediction model based on criteo data logs.

Installation

Setting up TorchRec is straightforward, with comprehensive guidance available in the Getting Started section of the documentation. Most users won't need to build from the source, but for those who do, a detailed process is outlined, including installing dependencies like PyTorch and FBGEMM, and running preliminary tests.

Building from Source

Install PyTorch: Details for configuring PyTorch for various CUDA versions are provided.
Clone TorchRec: The library can be cloned from GitHub.
Install Dependencies: Essential tools like FBGEMM should be installed.
Run Setup and Testing: Set up TorchRec and verify the installation with test scripts.

For more complex examples, users can explore the DLRM example for practical implementation guidance.

Contributing

TorchRec welcomes contributions and provides guidelines through its CONTRIBUTING.md file, offering ways for contributors to support the project.

Citation

When utilizing TorchRec in academic work, the recommended citation information in BibTeX format is provided.

License

TorchRec is released under the BSD license, details of which can be found in the library's LICENSE file.

Overall, TorchRec stands out as a robust tool for developing sophisticated recommendation systems, empowering developers with its array of features designed for handling large-scale data operations.