gpu_poor
The tool estimates required GPU memory and token throughput for large language models (LLMs) on multiple GPUs and CPUs. It provides a detailed memory usage analysis for both training and inference, supporting quantization tools such as GGML, bitsandbytes, and frameworks like vLLM, llama.cpp, and HF. Key functionalities include vRAM requirements, token rate calculation, and finetuning duration approximation. The tool assists in assessing quantization suitability, maximum context capacity, and batch-size capability for GPUs, offering valuable insights into GPU memory optimization.