PatrickStar
PatrickStar uses chunk-based memory management to optimize CPU and GPU resources, enabling the training of large models with fewer GPUs. This makes PTM training more accessible. Compatible with PyTorch, it supports cost-effective scaling and outperforms solutions like DeepSpeed by managing up to 175 billion parameters on small clusters.