TimescaleDB®
TimescaleDB® is an open-source extension for PostgreSQL®. It is designed to make PostgreSQL scalable for time series data.
This extension adds several features and functions. You can find more information on the official documentation.
TimescaleDB® at Scalingo
TimescaleDB® has some particularities which you should be aware of before getting started.
-
There are two licenses of TimescaleDB®: Community edition and Open-Source edition. The latter is the only one available on Scalingo. Hence TimescaleDB®’s functions labeled “community” on their documentation are not available on the platform.
-
On Scalingo, TimescaleDB® requires a PostgreSQL® database version 13.6.0-1 or above.
-
Working with a time series database usually involves the need to downsample the dataset in runtime and remove the oldest data. We provide documentation about these topics below.
-
While most documentation about PostgreSQL® remains applicable for TimescaleDB®, there are a few elements that differ. Especially regarding dumping and restoring a TimescaleDB® database.
Enabling TimescaleDB®
To enable TimescaleDB®:
Downsampling
Understanding Downsampling
Downsampling consists in decreasing the size of a dataset by selecting a subset of the data at a reduced granularity compared to the original. The resulting dataset has a much lower size, but still keeps its accuracy. This process allows for easier analysis, storage, or computation.
To better understang the concept and benefits, we provide an example below. This example is based on TimescaleDB® official documentation.
The data is stored in a table called conditions
that stores raw data of
temperature
and humidity
values.
We want to decrease the database size. To do so, we consider that we don’t need to have data by the minute for data older than one week. The average by hour is enough past a week. We will use a downsampling logic which reduces the data older than one week, and only stores the average by hour.
The data look like this:
$ select * from conditions;
time | location | temperature | humidity
-------------------------------+----------+--------------------+--------------------
2022-01-21 09:50:05.049911+00 | office | 75 | 52
2022-01-21 09:50:05.058412+00 | office | 70 | 57
2022-01-21 09:50:05.059489+00 | office | 76 | 52
2022-01-20 09:50:05.06032+00 | office | 79 | 54
Creating the Downsampling Procedure
- From the command line, open an Interactive Remote Console to access your database
- From the database command line, run the following commands to create a
procedure which aims at downsampling the
conditions
table:CREATE OR REPLACE PROCEDURE downsample_conditions (config jsonb) LANGUAGE PLPGSQL AS $$ DECLARE lag interval; chunk REGCLASS; tmp_name name; BEGIN SELECT jsonb_object_field_text (config, 'lag')::interval INTO STRICT lag; IF lag IS NULL THEN RAISE EXCEPTION 'Config must have lag'; END IF; FOR chunk IN SELECT show.oid -- we specify the table which need to be downsampled FROM show_chunks('conditions', older_than => lag) SHOW (oid) INNER JOIN pg_class pgc ON pgc.oid = show.oid INNER JOIN pg_namespace pgns ON pgc.relnamespace = pgns.oid INNER JOIN timescaledb_information.chunks chunk ON chunk.chunk_name = pgc.relname AND chunk.chunk_schema = pgns.nspname LOOP RAISE NOTICE 'Processing chunk: %', chunk::text; -- build name for temp table SELECT '_tmp' || relname FROM pg_class WHERE oid = chunk INTO STRICT tmp_name; -- copy downsampled chunk data into temp table EXECUTE format($sql$ CREATE UNLOGGED TABLE %I AS -- you can configure here the time range to aggregate data and how you do it. Here we used `avg` function on 1 hour SELECT time_bucket('1h', time), location, avg(temperature) as avg_temperature, avg(humidity) as avg_humidity FROM %s GROUP BY 1, 2; $sql$, tmp_name, chunk); -- clear original chunk EXECUTE format('TRUNCATE %s;', chunk); -- copy downsampled data back into chunk EXECUTE format('INSERT INTO %s(time, location, temperature, humidity) SELECT * FROM %I;', chunk, tmp_name); -- drop temp table EXECUTE format('DROP TABLE %I;', tmp_name); COMMIT; END LOOP; END $$;
Executing the Downsampling Procedure
- At the root of your applications’s source code, create a file called
downsample_conditions_table.sql
containing the following SQL script:call downsample_conditions(config => '{"lag":"3 week", "period":"1 hour"}');
This script invokes the procedure
downsample_conditions
and applies it for data older than three weeks and on a period of one hour. - Schedule the execution of the SQL script by using the Scalingo
scheduler.
Your
cron.json
file at the root of your application’s source code should look like this:{ "jobs": [ { "command": "0 0 * * * dbclient-fetcher psql && psql $SCALINGO_POSTGRESQL_URL -f downsample_conditions_table.sql", "size": "S" } ] }
The scheduled task will be executed every day at midnight.
Configuring Data Retention
Understanding Data Retention
A data retention policy aims at deleting old data from the database to save space. This can lead to faster queries and less storage costs. The policy defines what to do with the data and when.
To better understand the concept and benefits, we provide an example below.
This example creates a retention policy which applies on all hypertables.
It is possible to customize how it works by changing the format
function
depending on your data, or add a filter using a WHERE
clause in the PERFORM
query.
Creating the Data Retention Policy
- From the command line, open an Interactive Remote Console to access your database
- From the database command line, run the following commands to create a
procedure which aims at removing data older than
drop_after
:CREATE OR REPLACE PROCEDURE generic_retention (config jsonb) LANGUAGE PLPGSQL AS $$ DECLARE drop_after interval; schema varchar; name varchar; BEGIN SELECT jsonb_object_field_text (config, 'drop_after')::interval INTO STRICT drop_after; IF drop_after IS NULL THEN RAISE EXCEPTION 'Config must have drop_after'; END IF; -- You can modify the following query to add a more precise retention policy. FOR schema, name IN SELECT hypertable_schema, hypertable_name FROM timescaledb_information.hypertables LOOP RAISE NOTICE '%', format('%I.%I', schema, name); PERFORM drop_chunks(format('%I.%I', schema, name), older_than => drop_after); COMMIT; END LOOP; END $$;
Executing the Data Retention Policy
- At the root of your applications’s source code, create a file called
data_retention_procedure.sql
containing the following SQL script:call generic_retention(config => '{"drop_after":"12 month"}');
This script invokes the procedure
generic_retention
to remove data older than 12 months. - Schedule the execution of the SQL script by using the Scalingo
scheduler.
Your
cron.json
file at the root of your application’s source code should look like this:{ "jobs": [ { "command": "0 0 * * * dbclient-fetcher psql && psql $SCALINGO_POSTGRESQL_URL -f data_retention_procedure.sql", "size": "S" } ] }
The scheduled task will be executed every day at midnight.
Dumping and Restoring
Dumping Your TimescaleDB®
The procedure to dump data for PostgreSQL® is applicable. Yet, you won’t be able to restore this dump on a TimescaleDB® instance hosted on Scalingo.
For more details, please see Restoring a TimescaleDB® Dump.
Also, please note that you can’t backup individual hypertable. pg_dump
creates dumps that do not contain the information needed to properly restore
hypertables.
Restoring a TimescaleDB® Dump
As explained in TimescaleDB® official documentation, a specific process must be followed to restore from a backup. This process includes the use of a temporary database and the execution of several commands using admin rights, such as:
SELECT timescaledb_pre_restore();
\! pg_restore -Fc -d exampledb exampledb.psql
SELECT timescaledb_post_restore();
On Scalingo, default users do not have admin rights on their database. Consequently, only Periodic backups are restorable and the restoration process must be done by the Scalingo support team.
Make sure you keep track of which versions of PostgreSQL® and TimescaleDB® you are running during the backup process. For more information, see “Troubleshooting version mismatches” in the official documentation.
For more details about dumping and restoring TimescaleDB®, please refer to the official documentation.