HierarchicalKV - Enhancing Performance of Large-Scale Recommender Systems through Hierarchical Key-Value Storage

Introduction to HierarchicalKV

About HierarchicalKV

HierarchicalKV is a part of the NVIDIA Merlin framework, designed to meet the demanding requirements of recommendation systems (RecSys). It offers hierarchical key-value storage solutions which leverage the high-bandwidth memory (HBM) on GPUs and the host memory. This tool can also serve as a generic key-value storage library.

Benefits

In the field of machine learning, particularly when developing large recommender systems, engineers face several challenges:

The limited size of HBM on a single GPU, which struggles to handle large recommendation models scaling to terabytes.
Increasing difficulties in enhancing communication performance across extensive CPU clusters.
The complex task of managing limited HBM consumption with custom strategies.
Low utilization of HBM and host memory by most generic key-value libraries.

HierarchicalKV addresses these challenges by providing:

Capability to train large RecSys models using both HBM and host memory simultaneously.
Enhanced performance by bypassing CPUs and reducing communication workloads.
Memory management strategies based on Least Recently Used (LRU) or other custom approaches.
High operational load factors near the optimum level of 1.0.

Key Concepts

HierarchicalKV introduces several innovative ideas to enhance the efficiency and flexibility of key-value storage:

Local ordering of buckets
Separate storage for keys and values
Storage of all keys in HBM
Built-in and customizable eviction strategies

These innovations position NVIDIA GPUs as ideal for training comprehensive models for search, recommendations, and advertising tasks, addressing common hurdles in building, evaluating, and maintaining sophisticated recommendation systems.

API Overview

HierarchicalKV includes several key structures and classes. Key among them are:

HashTable: Handles storage and lookup of key-value pairs.
EvictStrategy: Manages intelligent data eviction techniques to optimize storage.
HashTableOptions: Offers configuration settings for the hash table.

For full details, you can explore the API documentation, which includes explanations of other functionalities and features.

Eviction Strategies

In HierarchicalKV, a key's "score" indicates its importance. Eviction of keys from storage only occurs when all available space is occupied. The following strategies help determine eviction priorities:

Lru: Uses the device clock for eviction timing.
Lfu: Increments frequency via specified parameters.
EpochLru & EpochLfu: Combines global epoch data with clock or frequency metrics.
Customized: Fully controlled by user-defined parameters.

Configuration Settings

HierarchicalKV offers various customizable options:

Initialization and Maximum Capacities: Control the storage limits.
Memory Utilization: Specifies the amount of HBM and host memory used for key-value pairs.
Dimensionality of Value Vectors and Bucket Sizes: Tailor the configuration to meet specific needs.

Users should generally maintain the default configurations for options to ensure optimal performance unless specific use cases dictate changes.

Example Usage

Below is a simplified example demonstrating how to set up and use HierarchicalKV for custom machine learning tasks:

#include "merlin_hashtable.cuh"

using TableOptions = nv::merlin::HashTableOptions;
using EvictStrategy = nv::merlin::EvictStrategy;

int main(int argc, char *argv[])
{
  using K = uint64_t;
  using V = float;
  using S = uint64_t;
  
  // Define table with LRU eviction strategy.
  using HKVTable = nv::merlin::HashTable<K, V, S, EvictStrategy::kLru>;
  std::unique_ptr<HKVTable> table = std::make_unique<HKVTable>();
  
  // Configure options.
  TableOptions options;
  options.init_capacity = 16 * 1024 * 1024;
  options.max_capacity = options.init_capacity;
  options.dim = 16;
  options.max_hbm_for_vectors = nv::merlin::GB(16);
 
  // Initialize table resources.
  table->init(options);
  
  // Use table for various operations.
  
  return 0;
}

Usage Restrictions

Certain restrictions apply when using HierarchicalKV:

The key_type should either be int64_t or uint64_t.
The score_type must only be uint64_t.

Building the Project

HierarchicalKV primarily functions as a header-only library but does provide binaries for benchmarking and testing. To build it, you'll need a compatible environment that supports recent versions of CUDA and GCC, along with Bazel or CMake for building.

Support and Contributions

HierarchicalKV is maintained by the NVIDIA Merlin Team and welcomes contributions from the public. For support and more information on contributing, users can refer to the issues page.

Conclusion

HierarchicalKV showcases the strength of NVIDIA's approach to large-scale data handling, particularly in dynamic fields like recommendation systems. By addressing memory constraints and performance hurdles, it facilitates the development and scaling of complex machine learning models efficiently.