Introduction to TraceML
TraceML is an advanced engine designed to cater to the needs of machine learning (ML) and data projects. It offers robust features for tracking, visualization, explainability, drift detection, and creating insightful dashboards. Developed under the Polyaxon umbrella, TraceML is well-suited for a diverse range of ML and data-centric applications.
Installation
To get started with TraceML, one can easily install it using pip:
pip install traceml
For those interested in utilizing its tracking features, installing polyaxon
alongside TraceML is recommended:
pip install polyaxon traceml
Offline Usage
TraceML offers flexibility with its offline mode, allowing users to track their runs without the need for an API. This can be activated by setting an environment variable:
export POLYAXON_OFFLINE="true"
Or by specifying the offline flag directly within the Python script:
from traceml import tracking
tracking.init(..., is_offline=True, ...)
Simple Usage in a Python Script
TraceML simplifies the process of logging data and tracking ML experiments. Here’s an example of how one can incorporate TraceML in a Python script:
import random
import traceml as tracking
tracking.init(
is_offline=True,
project='quick-start',
name="my-new-run",
description="trying TraceML",
tags=["examples"],
artifacts_path="path/to/artifacts/repo"
)
# Example of logging data references and inputs
tracking.log_data_ref(content=X_train, name='x_train')
tracking.log_inputs(batch_size=64, dropout=0.2, learning_rate=0.001, optimizer="Adam")
# Simulating tracking of metrics
def get_loss(step):
result = 10 / (step + 1)
noise = (random.random() - 0.5) * 0.5 * result
return result + noise
for step in range(100):
loss = get_loss(step)
tracking.log_metrics(loss=loss, accuracy=(100 - loss) / 100.0)
# Logging results
tracking.log_outputs(validation_score=0.66)
# Stopping the tracking process
tracking.stop()
Integration with Major ML Frameworks
TraceML integrates seamlessly with popular deep learning and machine learning frameworks, offering callbacks and specialized logging features. Here are a few examples:
Keras
TraceML can leverage Keras callbacks to automatically track metrics and model outputs:
from traceml import tracking
from traceml.integrations.keras import Callback
tracking.init(...)
model.fit(x_train, y_train, epochs=10, callbacks=[Callback()])
PyTorch
For PyTorch users, TraceML provides methods to log metrics, inputs, and outputs:
from traceml import tracking
tracking.init(...)
tracking.log_metrics(loss=loss)
TensorFlow
TensorFlow users can utilize TraceML to track metrics and models:
from traceml import tracking
tracking.init(...)
estimator.train(hooks=[Callback(log_image=True)])
Tracking Artifacts
TraceML enables logging of various artifacts and visualizations created with libraries such as matplotlib, Bokeh, Altair, and Plotly.
import matplotlib.pyplot as plt
from traceml import tracking
def plot_and_log(step):
figure, axs = plt.subplots()
axs.plot([1, 2, 3, 4], [1, 4, 2, 3])
tracking.log_mpl_image(figure, 'mpl_figure', step=step)
Enhanced DataFrame Analysis
TraceML extends the pandas DataFrame describe functionality with the DataFrameSummary
class, offering additional insights into column statistics and types:
from traceml.summary.df import DataFrameSummary
dfs = DataFrameSummary(df)
print(dfs.columns_stats)
Overall, TraceML enriches the ML development process by providing essential tools for tracking and visualizing data, making it an indispensable asset for data scientists and engineers.