TensorFlow Datasets
TensorFlow Datasets (TFDS) is a collection of ready-to-use datasets that provides a streamlined way of accessing and using a variety of public datasets as tf.data.Datasets
in machine learning projects. The primary aim of TFDS is to offer datasets that are easy to integrate into TensorFlow programs, enabling users to focus on training and evaluating their models rather than on data preparation and management.
Installation and Getting Started
To begin using TensorFlow Datasets, it is recommended to follow the getting started guide. This guide provides an interactive introduction, including examples that can be run directly in a Colab notebook. Through these resources, users are acquainted with the fundamental steps of installing TFDS and loading datasets for use in TensorFlow applications.
Documentation and Resources
TFDS is supported by comprehensive documentation that includes:
- Tutorials and Guides: These resources help users understand how to utilize TFDS effectively in their projects.
- Dataset Catalog: A complete list of all the available datasets can be found here.
- API Reference: Detailed API documentation is available here for more advanced manipulations and queries.
Core Values of TFDS
TensorFlow Datasets is built on several fundamental principles:
- Simplicity: Designed to cater to standard use-cases with ease, TFDS allows users to quickly start utilizing datasets without extensive setup.
- Performance: By following best practices, TFDS ensures efficient dataset access and can reach cutting-edge performance speeds.
- Determinism and Reproducibility: TFDS aims to guarantee that all users receive the same dataset examples in the same order, facilitating reproducible results.
- Customisability: Advanced users can tailor their dataset access, adjusting parameters to fit complex requirements.
Requesting and Adding Datasets
If a specific dataset is not available in TFDS, users can request it by opening a dataset request issue on GitHub. Additionally, individuals can contribute by adding datasets through the guidance provided here. Community involvement is encouraged, allowing users to vote on dataset requests by reacting to open issues.
Citation and License Information
When TFDS is used in academic work, it is important to include a citation, which is provided in the documentation. TFDS operates under the Apache 2.0 license, detailed in the LICENSE file.
Disclaimers
TFDS functions primarily as a utility for downloading and preparing public datasets but does not host or distribute these datasets. Users must verify their right to use datasets under the associated licenses. Dataset owners with updates or concerns can reach out via GitHub. The project encourages the adoption of Responsible AI Practices, ensuring ethical uses within the ML community.