25 Minute ELK Stack with Docker - Part 3

In previous articles in this series (Part 1 | Part 2) we set up a functioning ELK stack in 25 minutes, and added nginx for a reverse proxy with some basic authentication.

Now we have that ELK stack, it's time to put a meaningful amount of data into it. Some keen observers have noted that I've not really touched on actually getting data into the stack yet... this is deliberate, as I don't want to bog down the (mostly generic) tutorial with specific data ingestion steps.

So, in lieu of said decent data ingestion system, I'm going to go to my log file directory and run the following while the cluster is up:

for i in *; do echo $i; nc localhost 5000 < $i; done

(Hopefully you have a similarly large amount of logs ready to ingest)

This is immensely unsophisticated, but if you just want to get some data into the stack for analysis... it more or less works. In fact you could do just this kind of thing with a cron job at the end of each day to ingest that day's log file and you'd have a vaguely functional logging dashboard, albeit a very static one with a 24 hour delay between updates. Still, better than log diving!

If, like me, you're ingesting a large volume of data you might start noticing some concerning things going on, though. Logstash is issuing warnings on the console that it's getting HTTP 429 responses from ElasticSearch and needs to retry. This is fine; the bulk upload is stress-testing the system and Logstash is quite good at waiting for things to settle down and retrying, but get enough data in and Kibana starts giving you warnings even after the heavy upload has finished:

What's happening here is simple. ElasticSearch has an internal request queue, with a hard limit on how many requests can be queued. We're heavily loading that queue with requests, and hitting the limit. If we were running a website where we cared more about response time than serving all of our users eventually, this might be fine; most people get their pages quickly and a few people get an error (or fewer search results, or a stale cache entry) under peak load, rather than everyone getting their pages slowly.

But for Kibana, we're not worried about a big search taking a long time to serve. We want to see everything. We need to increase the queue limit.

Increasing the ElasticSearch queue limit

What we're going to do is to tell ElasticSearch to use an external configuration file, the same way we do with Logstash and nginx. Let's create that file now:

mkdir elasticsearch && vim elasticsearch/elasticsearch.yml

My configuration looks like this:

threadpool:  
  search:
     type: fixed
     size: 4
     queue_size: 5000

For more information on the default queue sizes, see https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html - what my configuration does is increase the search queue from 1000 to 5000. That means ElasticSearch will now let 5000 requests pile up in the queue before it starts returning HTTP 429 errors to consumers. That's going to be slower and use more memory, but we won't see the shard errors in Kibana any more. (I'm running on a 2 core box - if you have more or fewer cores than this you may want to tweak the size of the thread pool up or down as appropriate.)

Now let's edit the docker-compose.ymlfile to tell ElasticSearch to pick up that new configuration from outside the container. Change the elasticsearch section to this:

elasticsearch:  
  image: elasticsearch:2.0.0
  command: elasticsearch -Des.network.host=0.0.0.0
  volumes:
    - ./elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml

The volumes section should be pretty familiar by now; what we've done is told the ElasticSearch container to look outside the container to the host's filesystem when it accesses its configuration file. Now restart the stack. If, like me, you've got a lot of data in there it might take a while for ElasticSearch to come up - now is a good time to make a cup of tea.

You should see that those shard error messages are completely gone from Kibana, even if you look at huge date ranges.

Persisting data

There's one more thing left to do with the stack. At the moment, if I replace the ElasticSearch container for any reason, I lose all the data stored in my indexes. This is fine while we're just ingesting already-existing log files, but if we want to use this stack in production we're going to want slightly better guarantees on robustness.

Fixing this is trivial. In the same way that we told our containers to read their configuration files from the host's filesystem, we can also tell a container to write its data to the host.

First, create a directory for the data - mkdir data. In a production environment you might want to put this somewhere more useful, but for now let's keep everything self-contained.

Next, tell the ElasticSearch container to use this data directory by modifying its configuration in docker-compose.yml:

elasticsearch:  
  image: elasticsearch:2.0.0
  command: elasticsearch -Des.network.host=0.0.0.0
  volumes:
    - ./elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    - ./data:/usr/share/elasticsearch/data

Start the cluster with docker-compose up and you should see that data directory become populated by ElasticSearch. You can now delete the container without worrying about losing your data; when you bring the cluster back up it will read its data files from the data directory.

The great thing is that because Kibana stores its configuration in ElasticSearch, and Logstash/nginx both load their configuration externally, we don't need to create any more persistent data volumes. In fact, you can bring down the cluster, completely delete all traces of the Kibana container, and yet when you bring the cluster back up and access Kibana you'll be looking at the last visualisation you created.

Our stack is now ready for production. All that's left is the important bit: getting data into it.

Next time: Getting log data into the ELK stack in real time

Image by Matt Kimber CC-SA 3.0