Project Icon

gpu_poor

Optimize GPU Performance for LLMs with Memory and Token Metrics

Product DescriptionThe tool estimates required GPU memory and token throughput for large language models (LLMs) on multiple GPUs and CPUs. It provides a detailed memory usage analysis for both training and inference, supporting quantization tools such as GGML, bitsandbytes, and frameworks like vLLM, llama.cpp, and HF. Key functionalities include vRAM requirements, token rate calculation, and finetuning duration approximation. The tool assists in assessing quantization suitability, maximum context capacity, and batch-size capability for GPUs, offering valuable insights into GPU memory optimization.
Project Details