Curator
Curator is a tool that helps curating and managing Elasticsearch®’s indices. It is especially useful to manage indices lifecycles so that old logs are automatically removed from your database.
Deploying
-
Curator is written in Python. Consequently, the first step consists of adding the Python buildpack to the
.buildpacks
file present in your Logstash repository. The file must end up like this (order is important!):https://github.com/Scalingo/buildpack-jvm-common https://github.com/Scalingo/python-buildpack https://github.com/Scalingo/logstash-buildpack
-
To instruct the Python buildpack to install Curator and its dependencies, create a file named
Pipfile
at the root of your project and add apackages
section to it, with the following dependency:[packages] elasticsearch-curator = "7.0.1"
Don’t forget to generate the
Pipfile.lock
file and to commit your changes:pipenv lock git add Pipfile Pipfile.lock git commit -m "Add Curator requirements"
-
To configure the connection to the Elasticsearch database, create a file named
curator.yml
with the following content:--- client: hosts: - ${ELASTICSEARCH_HOST}:${ELASTICSEARCH_PORT} username: ${ELASTICSEARCH_AUTH_USERNAME} password: ${ELASTICSEARCH_AUTH_PASSWORD} logging: loglevel: INFO logfile: logformat: default
Commit your changes:
git add curator.yml git commit -m "Add Curator configuration"
-
To configure the database indices lifecycles, create a file named
log-clean.yml
with your instructions. The example below asks Curator to consider indices prefixed byLOGS_INDICES_PREFIX
and to remove those older thanLOGS_RETENTION_DAYS
days:actions: 1: action: delete_indices description: Delete old log indices options: ignore_empty_list: True disable_action: False filters: - filtertype: pattern kind: prefix value: ${LOGS_INDICES_PREFIX} - filtertype: age source: name direction: older timestring: '%Y.%m.%d' unit: days unit_count: ${LOGS_RETENTION_DAYS}
Don’t forger to commit your changes:
git add log-clean.yml git commit -m "Add Curator policies"
-
Curator is not a daemon, it is designed as a one-off process. To be able to run it on Scalingo on a regular basis, we advise to leverage our Scheduler.
At the root of your Logstash repository, create a file named
cron.json
to setup the recurring task. The following example starts curator everyday at 03:00 (3 AM) and 15:00 (3 PM):{ "jobs": [ { "command": "0 3,15 * * * curator --config curator.yml log-clean.yml" } ] }
Commit your changes:
git add cron.json git commit -m "Add Curator cron job"
-
The configuration files mention several environment variables. Let’s identify them and their values:
-
LOGS_RETENTION_DAYS
: retention period of logs, expressed in days. -
LOGS_INDICES_PREFIX
: indices prefix that helps identify logs affected by the policy. If you are following our guides, the value should beunicorns-
-
ELASTICSEARCH_HOST
ELASTICSEARCH_PORT
ELASTICSEARCH_AUTH_USERNAME
ELASTICSEARCH_AUTH_PASSWORD
must be retrieved from the value of the existingELASTICSEARCH_URL
. To do so, remember that it is made of several components separated one from each other by a delimiter, like so:http://<user>:<password>@<host>:<port>
The very last steps depends on the method chosen to deploy the Logstash instance.
-
Using the Command Line
-
Make sure you have followed the first steps
-
Create the required environment variables:
scalingo --app my-logstash env-set LOGS_RETENTION_DAYS=10 scalingo --app my-logstash env-set LOGS_INDICES_PREFIX="unicorns-" scalingo --app my-logstash env-set ELASTICSEARCH_HOST=... scalingo --app my-logstash env-set ELASTICSEARCH_PORT=... scalingo --app my-logstash env-set ELASTICSEARCH_AUTH_USERNAME=... scalingo --app my-logstash env-set ELASTICSEARCH_AUTH_PASSWORD=...
-
Push the updated code to trigger a new deployment:
git push scalingo master
Using the Terraform Provider
-
Make sure you have followed the first steps
-
Edit the
scalingo_app
resource in your Terraform file to add the environment variables, like so:resource "scalingo_app" "my-logstash" { [...] environment = { [...] LOGS_RETENTION_DAYS = 10, LOGS_INDICES_PREFIX = "unicorns-" ELASTICSEARCH_HOST = "..." ELASTICSEARCH_PORT = "..." ELASTICSEARCH_AUTH_USERNAME = "..." ELASTICSEARCH_AUTH_PASSWORD = "..." } }
-
Run
terraform plan
and check if the result looks good -
If so, run
terraform apply
-
Once Terraform is done, trigger a new deployment:
- Head to your dashboard
- Click on your Logstash application
- Click on the Deploy tab
- Click on Manual deployment in the left menu
- Click the Trigger deployment button
Updating
Using the Command Line
-
In your Logstash repository, edit the
requirements.txt
file to specify the version you want to deploy:elasticsearch-curator==8.0.15
-
Don’t forget to commit the change:
git add requirements.txt git commit -m "Update Curator to 8.0.15"
-
Trigger a new deployment:
git push scalingo master
Using the Terraform Provider
-
In your Logstash repository, edit the
requirements.txt
file to specify the version you want to deploy:elasticsearch-curator==8.0.15
-
Commit the change and push it to your remote repository:
git add requirements.txt git commit -m "Update Curator to 8.0.15" git push origin master
-
Trigger a new manual deployment if it’s not automated:
- Head to your dashboard
- Click on your Logstash application
- Click on the Deploy tab
- Click on Manual deployment in the left menu
- Click the Trigger deployment button
Customizing
Configuring
Curator can use a configuration file. This file is
mostly used to configure the Elasticsearch® database connection (URL,
credentials, …), as well as the logging settings. This file is written in
YAML. In our previous examples, it’s named curator.yml
.
Curator also requires an action file. This file
describes the list of actions Curator will run along with their options. This
file is written in YAML. In our previous examples, it’s named log-clean.yml
.
Environment
Curator is able to use environment variable references in both the configuration file and the action file. This allows to set values that need to be configurable at runtime. This makes it very convenient to customize your Curator deployment, since you can create as many environment variables as you want.
To do this, use the following syntax in your YAML files:
${MY_ENV_VAR}
It’s also possible to set a default value to use when the environment variable
is not defined (otherwise, Curator falls back to a value of None
):
${MY_ENV_VAR:default_value}