Insights is a part of functionality included in Starburst Enterprise platform (SEP). Insights shows cluster history, usage metrics, and query history and details. Data is persisted in the insights database. Postgres, Oracle, or MySql is a list of supported backend database management system which is used to store collected data.
Since insights data is generated pretty frequently, the size of the database is going to grow quickly and you will encounter two major issues. The first one is the space to store data and the second one is performance to retrieve data. To solve those issues, you need to set up a process to purge data on regular basis.
As per release 363-e, there are 5 tables for purging in the insights database. It includes some dependent tables where filter can be applied through another parent table.
- cluster_metrics - cluster history.
- completed_queries - details of completed queries.
- query_tables - number of records completed queries. It depends on completed_queries table.
- query_user_group - user groups to run queries. It depends on completed_queries table.
- usage_metrics - cluster usage metrics.
1. Develop script to purge data
The sample of the script is for Postgres database. DBeaver database administration tool is aimed for executing of the script as it is implemented script variables.
DAYS_RETENTION
parameters defines how many days back we want to keep in database, for example, 60.
TEST_MODE
parameter helps run it in test run mode with SELECT COUNT(*)
value or in purge mode with DELETE
value.
@set DAYS_RETENTION = 60
-- Options
-- 1. test run: SELECT COUNT(*)
-- 2. run: DELETE
@set TEST_MODE = DELETE
-- process query_tables table
${TEST_MODE} FROM
public.query_tables
WHERE
query_id IN (SELECT
query_id
FROM
public.completed_queries
WHERE
create_time < CURRENT_DATE - ${DAYS_RETENTION});
-- process query_user_group table
${TEST_MODE} FROM
public.query_user_group
WHERE
query_id IN (SELECT
query_id
FROM
public.completed_queries
WHERE
create_time < CURRENT_DATE - ${DAYS_RETENTION});
-- process completed_queries table
${TEST_MODE} FROM
public.completed_queries
WHERE
create_time < CURRENT_DATE - ${DAYS_RETENTION};
-- process cluster_metrics table
${TEST_MODE} FROM
public.cluster_metrics
WHERE
sample_interval_start < CURRENT_DATE - ${DAYS_RETENTION};
-- process usage_metrics table
${TEST_MODE} FROM
public.usage_metrics
WHERE
start_time < CURRENT_DATE - ${DAYS_RETENTION};
The script can be enhanced adding additional filters if needed, for example, if you need to delete queries belonging to a specific user, usr='user_name'
.
2. Figure out a connection to insights database
- Open
config.properties
file on your coordinator. - Find the settings.
insights.jdbc.url=jdbc:postgresql://postgresql.example.com:5432/sepquerylogger insights.jdbc.user=test_psql insights.jdbc.password=test12
3. Set up automatic process run on schedule
DBeaver script variables can be hardcoded and the script might be run as a crontab job in Linux, pgAgent agent, or pg_cron extension.
Comments
comments powered by Disqus