petastorm
Petastorm is an open-source library developed by Uber ATG that enables efficient deep learning with Apache Parquet format datasets. It integrates well with TensorFlow, PyTorch, and PySpark, supporting both local and distributed setups. Petastorm's features include selective column reads, multi-GPU training support, data shuffling, and caching. It offers APIs for seamless integration into Python and ML workflows, making data pipeline management straightforward.