Merlion: A Time Series Machine Learning Library
Introduction
Merlion is a specialized Python library designed for time series data intelligence. This comprehensive machine learning framework assists users with all stages of handling time series data, including data loading, transformation, model training, output post-processing, and performance evaluation. Merlion caters to different time series learning tasks like forecasting, anomaly detection, and change point detection for both univariate and multivariate datasets. It is especially valuable for engineers and researchers looking to quickly develop and benchmark models for unique time series requirements across various datasets.
Key features of Merlion include:
- Data Loading & Benchmarking: Offers standardized and easily extendable tools for loading and benchmarking diverse forecasting and anomaly detection datasets, including support for customized datasets.
- Variety of Models: Houses a diverse set of models for anomaly detection, forecasting, and change point detection unified under a consistent interface. Models range from classic statistical and tree ensemble methods to advanced deep learning techniques.
- User-Friendly Abstract Models: Provides
DefaultDetector
andDefaultForecaster
models which are both efficient and high-performing, serving as a solid foundation for beginners. - Automated Machine Learning: Features AutoML capabilities for streamlined hyperparameter tuning and model selection.
- Unified API: Supports various models to employ exogenous regressors for forecasting.
- Practical Post-Processing: Implements industry-inspired post-processing strategies to improve the interpretability of anomaly scores and reduce false positives.
- Easy Ensembles: Allows users to create model ensembles for more robust performance.
- Comprehensive Evaluation Pipelines: Enables flexible evaluation of models, simulating live deployment and re-training scenarios to assess forecasting and anomaly detection accurately.
- Intuitive Visualization: Offers built-in capabilities to visualize model predictions with a clickable interface.
- Distributed Computation: Provides a distributed backend via PySpark for handling large-scale industrial time series applications.
Comparison with Other Libraries
Merlion stands out due to its comprehensive feature set, which includes capabilities for both univariate and multivariate forecasting, anomaly detection, pre-processing, post-processing, automated machine learning (AutoML), benchmarking, visualization, and more. It is one of the few libraries providing exogenous regressors, change point detection, a clickable visual UI, and a distributed backend.
Installation
Merlion consists of two components: the core merlion
library and the ts_datasets
package. The core library can be installed via PyPI, while ts_datasets
can be installed from the source for comprehensive dataset management. Merlion may require specific external dependencies depending on the models employed, such as OpenMP for some forecasting models and the Java Development Kit (JDK) for certain anomaly detection methods.
Documentation
Merlion offers extensive documentation and example Jupyter notebooks to guide users through its features. A detailed API guide is available online, along with a comprehensive technical report.
Getting Started
Users can start with Merlion using its web-based dashboard, enabling quick experimentation with different models on custom datasets. The dashboard provides intuitive UI for both anomaly detection and forecasting models, ensuring an easy entry point for new users.
Evaluation and Benchmarking
Merlion offers a unique evaluation pipeline capable of simulating real-world conditions for models' performance on historical data. This allows users to benchmark models under realistic production conditions, providing valuable insights into their effectiveness.
Technical Report and Citation
Merlion's technical report details its architecture and performance benchmarks. Researchers or developers utilizing Merlion in their work are encouraged to cite it using the provided BibTeX entry.
In summary, Merlion is an all-inclusive library that equips users with powerful tools for time series analysis, making it a top choice for those in need of advanced modeling capabilities and efficient benchmarking solutions.