Chronos: Learning the Language of Time Series
Introduction to Chronos
Chronos is a collection of pretrained models designed for time series forecasting, utilizing language model architectures to interpret temporal sequences. In essence, time series data is converted into a sequence of tokens through scaling and quantization, which is then processed by a language model trained with cross-entropy loss. This process enables Chronos to generate probabilistic forecasts by sampling multiple potential future scenarios based on past data. The training of these models was conducted on both public time series datasets and synthetic data produced using Gaussian processes.
Architecture
Chronos models are built upon the T5 architecture, but with a notable adjustment to the vocabulary size—utilizing 4096 tokens instead of T5's original 32128. This reduction allows for a more efficient parameterization. Here's a quick overview of some available models:
- chronos-t5-tiny: 8M parameters
- chronos-t5-mini: 20M parameters
- chronos-t5-small: 46M parameters
- chronos-t5-base: 200M parameters
- chronos-t5-large: 710M parameters
Zero-Shot Results
Chronos stands out with impressive zero-shot performance, meaning it can accurately predict on datasets it hasn't seen before. When tested against local models and other pretrained models on 27 different datasets, Chronos performed exceptionally well. For a deeper dive into these evaluations and comparisons, readers are encouraged to refer to the original research paper.
How to Use Chronos
To make predictions with Chronos models:
-
Installation: You can install Chronos using:
pip install git+https://github.com/amazon-science/chronos-forecasting.git
For more comprehensive production usage, it's recommended to integrate Chronos with AutoGluon, a tool that facilitates model ensembling and deployment.
-
Forecasting Example: Here's a snippet on how to perform forecasting:
import pandas as pd import torch from chronos import ChronosPipeline pipeline = ChronosPipeline.from_pretrained( "amazon/chronos-t5-small", device_map="cuda", torch_dtype=torch.bfloat16, ) df = pd.read_csv("https://raw.githubusercontent.com/AileenNielsen/TimeSeriesAnalysisWithPython/master/data/AirPassengers.csv") forecast = pipeline.predict( context=torch.tensor(df["#Passengers"]), prediction_length=12, num_samples=20, )
-
Visualization: You can visualize the forecast data using matplotlib:
import matplotlib.pyplot as plt import numpy as np forecast_index = range(len(df), len(df) + 12) low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0) plt.figure(figsize=(8, 4)) plt.plot(df["#Passengers"], color="royalblue", label="historical data") plt.plot(forecast_index, median, color="tomato", label="median forecast") plt.fill_between(forecast_index, low, high, color="tomato", alpha=0.3, label="80% prediction interval") plt.legend() plt.grid() plt.show()
Datasets
Datasets utilized in the training and evaluation of Chronos are accessible on HuggingFace, providing a rich resource of both in-domain and zero-shot datasets.
Security and License
Chronos is licensed under Apache-2.0, ensuring that it's open for community use and contribution, with clear guidelines provided for any security issues.
Conclusion
Chronos represents a significant advancement in how time series forecasting can be approached using principles borrowed from language modeling. Its ability to perform well even on unseen data—paired with its integration capabilities with other tools—makes it a versatile option for those working in predictive modeling and analytics.