neural-compressor
Intel Neural Compressor offers model compression techniques including quantization, pruning, and distillation for frameworks like TensorFlow and PyTorch. It supports various Intel hardware and other platforms via ONNX Runtime. The library facilitates validating LLMs, cloud integrations, and optimizing model performance. Recent updates improve performance and integrate user-friendly APIs to enhance efficiency.