pg_timeseries - Optimize Time-Series Data Handling with the pg_timeseries Extension for PostgreSQL

Introduction to pg_timeseries

pg_timeseries, also known as the Tembo Time-Series API, is an extension designed to streamline and enhance the user experience when working with time-series data in PostgreSQL. This tool focuses on the creation, maintenance, and management of time-series tables, providing a comprehensive set of features to handle such data effectively.

Installation

Running with Docker

pg_timeseries can be easily installed using Docker. You can start a Docker container with PostgreSQL pre-configured with the pg_timeseries extension. Here’s a quick command to get started:

docker run -d --name pg-timeseries -p 5432:5432 -e POSTGRES_PASSWORD=postgres quay.io/tembo/timeseries-pg:latest

After starting the Docker container, connect to the database using the following command:

psql postgres://postgres:postgres@localhost:5432/postgres

To enable the extension, run:

CREATE EXTENSION timeseries CASCADE;

This process automatically installs additional required extensions like columnar, pg_cron, and pg_partman.

Getting Started

Once you have a partitioned table set up, activate it as a time-series table with a simple SQL call:

SELECT enable_ts_table('sensor_readings');

This command restructures the table into partitions organized by time (weekly by default). Future partitions are automatically created with an hourly maintenance job, ensuring efficient data organization and availability.

Using Your Tables

Indexes

pg_timeseries initially uses partitioned PostgreSQL tables, compatible with all existing PostgreSQL functionalities. It's crucial to add an index on the time dimension for optimal performance. A traditional B-Tree index is often sufficient, but a BRIN index might perform better for large datasets.

Partition Sizing

Managing partition size is effortlessly handled by pg_timeseries, which offers views to help monitor the size of partitions. Aim for each partition to be about a quarter of your system's available memory to ensure efficient queries and operations.

Retention

To prevent data overflow, you can easily implement a retention policy. By using set_ts_retention_policy, you can specify a time interval (e.g., '90 days') beyond which old data is automatically purged. If needed, clear_ts_retention_policy will remove this policy, reverting to default infinite retention.

Compression

For older data that isn't frequently accessed, compression is a perfect solution. Use set_ts_compression_policy to compress older partitions at specified intervals (e.g., '1 month'). While clearing the policy is possible, it won't decompress existing partitions.

Analytics Helpers

Some built-in functions aid in performing time-series analytics efficiently:

first and last Functions: These functions are handy for retrieving the initial or last entries of a group based on a time dimension.
```
SELECT machine_id, last(cpu_util, recorded_at) FROM events GROUP BY machine_id;
```
date_bin_table Function: It aligns time-series data to a specific interval, ensuring complete data coverage even with missing entries.
```
SELECT * FROM date_bin_table(NULL::target_table, '1 hour', '[2024-02-01 00:00, 2024-02-02 15:00]');
```

Requirements

The pg_timeseries extension relies on other extensions such as Hydra Columnar, pg_cron, and pg_partman. Familiarity with these can enhance your capacity to leverage pg_timeseries fully.

Roadmap

Future plans for pg_timeseries include improved analytic functions, better storage solutions for aging data, enhanced maintenance capabilities, and more. User feedback will play a critical role in prioritizing these features to meet real-world needs effectively.

In summary, pg_timeseries offers robust, flexible solutions for managing time-series data in PostgreSQL, with a clear pathway towards even greater functionality and performance optimizations.