In a previous post, we talked about how Traefik fits into our infrastructure and helps us serve requests with ease. This runbook is an extension to that - an overview of things we found important to highlight for someone looking to know more about running Traefik themselves including useful configuration snippets.
Here is our runbook for getting setup with Traefik in your environment:
PS: Some of the configuration is AWS centric but the general concepts and Traefik specific configuration applies more broadly.
We currently use Datadog for monitoring Traefik.
If you're running a Datadog agent on your instance, it is typically set to listen on port
This setup can work with other StatsD compatible endpoints too.
You can drop this section into the static config to enable metrics and the UI dashboard:
The Traefik dashboard is a well-done single pane of glass that lets us monitor our services and traffic at a glance. You can route an internal DNS name to point to the dashboard ingress port to isolate the entrypoint serving the dashboard from the entrypoint serving regular traffic.
If you're using middlewares, you can also inspect what middlewares and routing rules apply directly via the UI.
It was fairly straightforward for us to simply run multiple Traefik instances with support from Consul. Since we run Consul in our environment, our service containers running on docker advertize their IP addresses (and routing metadata) to Consul.
Traefik has built-in support to pull in updates from consul at fixed intervals (https://doc.traefik.io/traefik/providers/consul-catalog/). This makes spinning multiple Traefik instances for HA simply plug-and-play since each instance can independently fetch updates from Consul without extra coordination overhead.
If you don't run a local consul agent on the same node as Traefik, you can also set the
endpoint to your consul server address like
Run Traefik instances in a dedicated auto-scaling group and wire it in to the ALB. As the autoscaling group scales up/down, ALB keeps the routing information up-to-date and will forward traffic to new instances as they come up.
The default interval between querying consul for config updates is
Make sure your service deploys are graceful by allowing enough time between each instance in your rolling deploy.
Waiting for ~30 seconds between each instance update should give Traefik+Consul enough time to propagate membership updates. Adjust delay to account for bootstrap operations to finish too.
If you're using Nomad, adding service metadata tags makes it easy to declare routing config right next to your service definitions.
Nomad takes care of propagating these tags to Consul. Traefik leverages the Consul Catalog integration to periodically fetch updates and dynamically adjusts configuration. As our containers spin up (or down), the membership information is automatically kept up to date.
This was covered in the previous code example but is worth highlighting. You can easily enable gzip compression via a middleware for all the responses flowing back through it.
For declaring and using compression middleware in-line with consul catalog tags:
We recommend setting a global ratelimiter to protect your infrastructure from DDOS attacks and accidental for-loops in client side code.
This declares a global ratelimit middleware. You can tweak the params of this ratelimiter like the period, burst delay etc dynamically (see next section for live config updates).
You can set a convention to apply the
global-ratelimit middleware to all routes and opt-in to more specific ratelimiters whenever needed by specific service backends.
Traefik config can be divided into two logical sections:
A static config section that doesn't change often (needs a Traefik redeploy)
We define things like where to find Consul, top level config, UI dashboard access etc in the static section.
Let's call this file
The dynamic config section that Traefik can update on-the-fly
Since we added a dynamic file provider block, we can drop updates into the
traefik_dynamic_config.toml file and Traefik will apply the changes without restarting.
Example of the dynamic config:
This file in conjunction with the dynamically updating service tags via Consul gives us pretty good coverage for applying most changes without having to do a full deploy. You can update these dynamic config sections live or add more middlewares and routers, and attach them to your services without restarting Traefik.
Using Nomad, we propagate updates the dynamic file via Consul:
- Store the contents of
traefik_dynamic_config.tomlas a Consul key
- Use Nomad
templateblock to sync updates when the file is changed on Consul
[providers.file] watch=trueconfig in Traefik will pick up changes dynamically
You can use this template block as a starting point:
It is sometimes helpful to know which Traefik instance handled a particular request while you're debugging. We found this simple hack to give unique names to our Traefik instances:
You can declare an entrypoint with a unique name that doesn't really route any traffic.
This lets you configure your services to bind with a proper, well-known entrypoint as usual (
httpsIngress in this example) while also getting a unique name displayed in the UI dashboard.
Similarly, you can also have a middleware that injects the unique id of the traefik instance into the response headers with every request.
We hope this runbook helps you with setting up Traefik in your own environment. If you have other cool tips and tricks, please share them with us and we'd be happy to update this runbook.