Introduction to TensorFlow Decision Forests (TF-DF)
TensorFlow Decision Forests (TF-DF) is a powerful library designed for training, running, and interpreting decision forest models such as Random Forests and Gradient Boosted Trees within the TensorFlow ecosystem. These models are widely used for tasks such as classification, regression, and ranking, providing a versatile tool for various machine learning applications.
Powered by Yggdrasil Decision Forests
TF-DF is backed by the capabilities of Yggdrasil Decision Forests (YDF), a comprehensive library that supports the creation and deployment of decision forest models across multiple platforms, including C++, JavaScript, CLI, and Go. An added advantage of TF-DF is its compatibility with YDF models, facilitating seamless transitions and integrations between the two.
Cross-Platform Availability
Currently, TensorFlow Decision Forests is available for users operating on Linux and macOS systems. Windows users can also access the library using the Windows Subsystem for Linux (WSL) combined with Linux, ensuring broad usability across different environments.
Getting Started: A Usage Example
To demonstrate the simplicity and effectiveness of TF-DF, a minimal example is provided below:
import tensorflow_decision_forests as tfdf
import pandas as pd
# Load the dataset in a Pandas dataframe.
train_df = pd.read_csv("project/train.csv")
test_df = pd.read_csv("project/test.csv")
# Convert the dataset into a TensorFlow dataset.
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="my_label")
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_df, label="my_label")
# Train the model
model = tfdf.keras.RandomForestModel()
model.fit(train_ds)
# Look at the model.
model.summary()
# Evaluate the model.
model.evaluate(test_ds)
# Export to a TensorFlow SavedModel.
# Note: the model is compatible with Yggdrasil Decision Forests.
model.save("project/model")
This code snippet provides a straightforward template for loading data, training a decision forest model, evaluating its performance, and ultimately saving the model for future use.
Resources and Documentation
A variety of resources are available to help users get the most out of TensorFlow Decision Forests:
- TF-DF on TensorFlow.org: Offers API reference, guides, and tutorials.
- Tutorials: Step-by-step guides on using TF-DF.
- YDF Documentation: Contains additional information applicable to TF-DF.
- Issue Tracker and Known Issues: For querying ongoing issues or reporting bugs.
Installation
Installing TensorFlow Decision Forests is straightforward. Users can simply run the command:
pip3 install tensorflow_decision_forests --upgrade
Further installation instructions, troubleshooting tips, and alternative solutions are available in the installation documentation.
Contributing
The development team welcomes contributions from the community to further enhance TensorFlow Decision Forests and Yggdrasil Decision Forests. Prospective contributors can find more information in the developer manual and contribution guidelines.
Citing TensorFlow Decision Forests
For users who incorporate TensorFlow Decision Forests into scientific publications, it is recommended to cite the foundational paper titled "Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library." This acknowledgment not only credits the developers' efforts but also aids others in locating relevant research materials.
Contact and Credits
For inquiries or feedback, individuals can reach out to the core development team via email at [email protected].
The project is developed and maintained by a dedicated team, including Mathieu Guillame-Bert, Jan Pfeifer, Richard Stotz, Sebastian Bruch, and Arvind Srinivasan, under the Apache License 2.0.