Getting started with the ELK Stack on Scalingo
The Elastic Stack (formerly known as the ELK Stack) is a powerful collection of softwares that lets you collect data from any source using any format. It gives you the tools to search, visualize and analyze it in real time.
This tutorial will show you how to deploy the ELK stack on Scalingo in under 5 minutes.
What is the ELK Stack?
The ELK stack is based on three major components:
- Elasticsearch®
- Logstash
- Kibana
Elasticsearch® is a distributed full-text search engine, able to store JSON document and index them efficiently, it is responsible for the storage of all the incoming data.
Logstash is a data processing pipeline, any source sends data as input. It is able to format and modify data on the fly before forwarding it to the chosen destination (usually an Elasticsearch® database).
Kibana is a powerful web-based data visualization tool providing everything you need to explore your data and build useful and efficient dashboards.
Logstash
Let’s start by bootstrapping the Logstash container. This instance will take data from an authenticated input and send them to an Elasticsearch® database. This is the EL part in ELK.
To get started, you can use our boilerplate:
$ git clone https://github.com/Scalingo/logstash-boilerplate
$ cd logstash-boilerplate
Next, create an application on Scalingo that will run our Logstash app:
$ scalingo create my-awesome-logstash
Add the Scalingo for Elasticsearch® addon to this application:
$ scalingo --app my-awesome-logstash addons-add elasticsearch elasticsearch-starter-1024
All the Scalingo for Elasticsearch® plans are described here.
Of course, not everyone should be able to send data to your Logstash instance, it should be
protected via HTTP basic auth. It is already handled in the boilerplate but the
environment variables USER
and PASSWORD
should be set first:
$ scalingo --app my-awesome-logstash env-set USER=my-awesome-logstash-user PASSWORD=iloveunicorns
Logstash is greedy in memory, it requires at least a container of size L, configure the app to use one.
$ scalingo --app my-awesome-logstash scale web:1:L
Edit the logstash.conf
file to change the index name of the Elasticsearch®
output. The goal is to make it fit semantically to the data which will be
stored:
output {
elasticsearch {
[...]
# OLD
index => "change-me-%{+YYYY.MM.dd}"
# NEW
index => "unicorns-%{+YYYY.MM.dd}"
}
}
Commit your changes
$ git add logstash.conf
$ git commit -m "Update the index name"
And you’re all set, git push scalingo master
and your Logstash instance
will be up and running!
You can now try to send some data to your Logstash instance:
$ curl --request POST 'https://my-awesome-logstash-user:iloveunicorns@my-awesome-logstash.osc-fr1.scalingo.io?name=Alanala' --data 'Hi!'
ok
It’s time to verify all the indices that are stored in the Elasticsearch® database:
$ scalingo --app my-awesome-logstash run bash
> curl $SCALINGO_ELASTICSEARCH_URL/_cat/indices
yellow open unicorns-2018.01.26 _0XNpJKzQc2kjhTyxf4DnQ 5 1 1 0 6.6kb 6.6kb
Logstash has created the unicorn index which can now be requested:
> curl $SCALINGO_ELASTICSEARCH_URL/unicorns-2018.01.26/_search | json_pp
{
"_shards" : {
// [...]
},
// [...]
"hits" : {
"total" : 1,
"max_score" : 1,
"hits" : [
{
"_type" : "logs",
"_score" : 1,
"_source" : {
"name" : "Alanala",
"message" : "Hi!",
"url" : "?name=Alanala",
"@timestamp" : "2018-01-26T11:57:03.155Z"
// [...]
},
// [...]
}
]
}
}
The result of the above search contains a document having with a field name
set to Alanala
and a field message
set to Hi!
.
Custom Configuration for Logstash
The cloned boilerplate used to deploy your application contains a config
directory.
All the files in this folder will be copied in the Logstash configuration directory at runtime,
allowing you to customize exactly how you want Logstash to run.
For instance, if you want to edit the logging behavior of Logstash, edit config/log4j2.properties
git add config/log4j2.properties
git commit -m "Update how Logstash is logging"
git push scalingo master
Kibana
To deploy Kibana on Scalingo, you are invited to use our one-click button over here:
The ELASTICSEARCH_URL
environment variable from the previously created
Logstash application should be used in the deployment process:
scalingo --app my-awesome-logstash env | grep SCALINGO_ELASTICSEARCH_URL
Then, a username and a password should be defined to configure Kibana authentication. The authentication will be handled using Basic Auth via the environment variables KIBANA_USER
and KIBANA_PASSWORD
.
Once deployed, index patterns need to be configured. This is required to inform Kibana about the indices of Elasticsearch® it need to look at.
In this example, the unicorns-*
pattern will be used.
Click on create and you’re all set, the test input done in the previous section should appear in the Discover tab of Kibana dashboard.
With TLS connection on Elasticsearch®
If your Scalingo for Elasticsearch® addon has the Force TLS option enabled, you must
set the environment variable ELASTICSEARCH_TLS_CA_URL
on your Kibana
application with the URL of our CA certificate. The CA certificate URL is available on the database
dashboard.
Send your application logs to your own ELK stack
One of the multiple usages of the ELK stack is log parsing, storing and exploration. If you’ve set up your ELK stack for this, we have a feature called log drains that will automatically send every log line generated by an application to an ELK stack. Here is how to add a log drain to your application.
When using this configuration, the application name and container index will be passed in the http query and the message will be in the request body. To parse this and create meaningful index, you can use the following Logstash configuration (if your logs are JSON formatted):
input {
http {
port => "${PORT}"
user => "${USER}"
password => "${PASSWORD}"
}
}
filter {
grok {
match => [ "[headers][request_path]", "%{URIPARAM:url}" ]
remove_field => ["headers"]
}
kv {
source => "url"
field_split => "&"
trim_key => "?"
}
mutate {
rename => {
"appname" => "source"
"hostname" => "container"
}
replace => {
"host" => "%{source}-%{container}"
}
}
json {
source => "message"
target => "msg"
}
}
output {
elasticsearch {
hosts => "${ELASTICSEARCH_HOST}"
user => "${ELASTICSEARCH_USER}"
password => "${ELASTICSEARCH_PASSWORD}"
index => "unicorns-%{+YYYY.MM.dd}"
}
}
Curator
Installing Curator
Since logs are only relevant for a short period of time, it is current practice to remove logs that are too old to be relevant. This is done to reduce the load on the database and limit the disk usage.
This is where Curator is needed. This project is designed to let you manage your indices life cycle.
Curator can be installed on the existing Logstash application
my-awesome-logstash
. As Curator is written in Python, you have to modify your
.buildpacks
file to add the Python buildpack, so that it ends up like this:
https://github.com/Scalingo/buildpack-jvm-common
https://github.com/Scalingo/python-buildpack
https://github.com/Scalingo/logstash-buildpack
To instruct the Python buildpack to install Curator and its dependencies,
create a file named requirements.txt
at the root of your application:
elasticsearch-curator==7.0.1
Configuring Curator
Next step is to configure Curator. First, you need to configure how Curator
connects to your database. Create a file curator.yml
with the following
content:
---
client:
hosts:
- ${ELASTICSEARCH_HOST}
username: ${ELASTICSEARCH_AUTH_USERNAME}
password: ${ELASTICSEARCH_AUTH_PASSWORD}
logging:
loglevel: INFO
logfile:
logformat: default
Curator cannot use the ELASTICSEARCH_URL
environment variable. You
need to define three other environment variables on your app, duplicating
ELASTICSEARCH_URL
content.
Hence, if your ELASTICSEARCH_URL
variable is set to
http://user:password@host:port
, you need to define:
ELASTICSEARCH_HOST=host:port
ELASTICSEARCH_AUTH_USERNAME=user
ELASTICSEARCH_AUTH_PASSWORD=password
Now you have to configure your indices life cycle. This is based on your
indices names.
Create a file named log-clean.yml
. This configuration parses the indices
names stored in Elasticsearch® and removes the ones that are too old.
actions:
1:
action: delete_indices
description: Delete old log indices
options:
ignore_empty_list: True
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: ${LOGS_INDICES_PREFIX}
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: ${LOGS_RETENTION_DAYS}
You now need to add two environment variables:
LOGS_INDICES_PREFIX=unicorns
LOGS_RETENTION_DAYS=10
The first environment variable is LOGS_INDICES_PREFIX
. It configures the
index pattern that should be affected by this policy. Setting this variable
to unicorns
prevent Curator from deleting the other indices that are stored
in the same database.
The second environment variable is LOGS_RETENTION_DAYS
. It configures
the retention time of your logs (in days). Setting this variable to 10
,
Curator will delete an index if it is 10+ days old.
Scheduling the Curator Task
Curator is not a daemon, it is designed as a one-off process. To be able to run it on Scalingo you can leverage our Scheduler.
At the root of your Logstash directory, create a file named cron.json
to
setup your recurring task. The following example starts curator everyday at
03:00 (3 AM) and 15:00 (3 PM):
{
"jobs": [
{
"command": "0 3,15 * * * curator --config curator.yml log-clean.yml"
}
]
}
The last step is to trigger a new deployment of your Logstash instance by pushing your changes to your Scalingo remote.
That’s all folks!
This tutorial is based on an article published on our blog.