Introduction to PyTorch Forecasting
PyTorch Forecasting is an advanced package designed to forecast time series data using modern deep learning models. Built on PyTorch, an established deep learning framework, this package integrates seamlessly with PyTorch Lightning. This integration enables scalable training on both GPUs and CPUs, with features such as automatic logging to aid the development process.
Key Features
PyTorch Forecasting aims to simplify the process of using neural networks for time series forecasting, providing tools that are suitable for both extensive research and practical applications. It balances ease of use for beginners with flexibility for professionals. Its notable features include:
- Time Series Dataset Class: This class handles variable transformations, addresses missing data, supports random subsampling, and accommodates various history lengths, making data management seamless.
- Base Model Class: Facilitates the basic training of time series models, offering support for tensorboard logging and providing visualizations like actual versus predicted data plots and dependency plots.
- Built-In Models: Offers various enhanced neural network architectures, optimized for real-world scenarios and equipped with interpretation capabilities.
- Performance Metrics: Supports multi-horizon time series metrics for accurate assessment.
- Hyperparameter Tuning: Leverages Optuna, a powerful optimization framework, for precise hyperparameter tuning.
PyTorch Forecasting leverages PyTorch Lightning to easily enable model training on different hardware configurations, whether on single or multiple GPUs or even just a CPU.
Installation
Installing PyTorch Forecasting is straightforward. For Windows users, PyTorch should first be installed using:
pip install torch -f https://download.pytorch.org/whl/torch_stable.html
Afterward, PyTorch Forecasting itself can be installed via pip:
pip install pytorch-forecasting
Alternatively, for those using Conda, the package can be installed with:
conda install pytorch-forecasting pytorch -c pytorch>=1.7 -c conda-forge
For users interested in the MQF2 loss for multivariate quantile loss, this can be included with:
pip install pytorch-forecasting[mqf2]
Documentation and Tutorials
Comprehensive documentation, including tutorials, is available on the PyTorch Forecasting documentation site. These resources cover everything from getting started to more advanced usage scenarios.
Available Models
A variety of models are supported, each tailored for specific forecasting needs:
- Temporal Fusion Transformers: Excels in multi-horizon time series forecasting and offers superior performance benchmarks.
- N-BEATS: A versatile model that has topped the M4 competition, a significant benchmark in univariate time series forecasting.
- N-HiTS: Known for supporting covariates, it consistently outperforms N-BEATS, especially suited for long-horizon forecasting.
- DeepAR: A popular model for probabilistic forecasting using autoregressive recurrent networks.
- Standard Networks: Includes LSTM, GRU, and MLP for baseline comparisons, alongside a straightforward baseline model that predicts based on the latest known value.
To extend the package with custom models, a dedicated tutorial for implementing new models is available, covering both basic and advanced architectures.
Usage Example
Training models with PyTorch Forecasting involves straightforward integration with PyTorch Lightning. Data, usually in pandas DataFrames, is first converted into a TimeSeriesDataSet. Below illustrates a basic workflow:
# Necessary imports for training
import lightning.pytorch as pl
from lightning.pytorch.loggers import TensorBoardLogger
from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer, QuantileLoss
# Load and prepare data
data = ...
# Define the TimeSeriesDataSet
training = TimeSeriesDataSet(
data[lambda x: x.date <= "YYYY-MM-DD"], # specify training cutoff date
time_idx= ..., # time column
target= ..., # prediction target column
group_ids=[ ... ], # unique identifier columns
max_encoder_length=36,
max_prediction_length=6,
static_categoricals=[ ... ],
static_reals=[ ... ],
time_varying_known_categoricals=[ ... ],
time_varying_known_reals=[ ... ],
time_varying_unknown_categoricals=[ ... ],
time_varying_unknown_reals=[ ... ],
)
# Convert the dataset to a dataloader for training
train_dataloader = training.to_dataloader(train=True, batch_size=128, num_workers=2)
This straightforward pipeline ensures efficient model training, supported by modern neural network techniques and comprehensive logging capabilities.