The MediaMachine Runbook for Traefik

MediaMachine.io is an IaaS platform for user-generated video content and we use Traefik as the reverse proxy in our network layer.

In a previous post, we talked about how Traefik fits into our infrastructure and helps us serve requests with ease. This runbook is an extension to that - an overview of things we found important to highlight for someone looking to know more about running Traefik themselves including useful configuration snippets.

The MediaMachine Traefik Runbook#

Here is our runbook for getting setup with Traefik in your environment:


PS: Some of the configuration is AWS centric but the general concepts and Traefik specific configuration applies more broadly.

Monitoring + Metrics + Dashboard#

We currently use Datadog for monitoring Traefik. If you're running a Datadog agent on your instance, it is typically set to listen on port 8125.

tip

This setup can work with other StatsD compatible endpoints too.

You can drop this section into the static config to enable metrics and the UI dashboard:

[metrics]
[metrics.statsd]
address = "localhost:8125"
[api] # Enable api and dashboard, defaults to :8080
dashboard = true

The Traefik dashboard is a well-done single pane of glass that lets us monitor our services and traffic at a glance. You can route an internal DNS name to point to the dashboard ingress port to isolate the entrypoint serving the dashboard from the entrypoint serving regular traffic.
Terraform AWS Target Group Config for Traefik Dashboard
resource "aws_lb_target_group" "traefik-dashboard" {
name = "traefik-dashboard"
port = 8080
protocol = "HTTP"
vpc_id = <your vpc id>
deregistration_delay = 60
# Health Checks yay!
health_check {
path = "/ping"
}
lifecycle {
create_before_destroy = true
}
}
## Attach a listener rule to ALB
resource "aws_lb_listener_rule" "traefik-dashboard" {
listener_arn = <alb listener arn>
priority = 100
action {
type = "forward"
target_group_arn = aws_lb_target_group.traefik-dashboard.arn
}
condition {
host_header {
# We recommend using an internal dns zone to keep your
# dashboard isolated from the internet
values = ["traefik.internal.example.com"]
}
}
}

If you're using middlewares, you can also inspect what middlewares and routing rules apply directly via the UI.


High Availability & Failover#

It was fairly straightforward for us to simply run multiple Traefik instances with support from Consul. Since we run Consul in our environment, our service containers running on docker advertize their IP addresses (and routing metadata) to Consul.

Traefik has built-in support to pull in updates from consul at fixed intervals (https://doc.traefik.io/traefik/providers/consul-catalog/). This makes spinning multiple Traefik instances for HA simply plug-and-play since each instance can independently fetch updates from Consul without extra coordination overhead.

[providers.consulCatalog]
refreshInterval = "30s"
[providers.consulCatalog.endpoint]
address = "127.0.0.1:8500"

If you don't run a local consul agent on the same node as Traefik, you can also set the endpoint to your consul server address like consul.internal.example.com

tip

Run Traefik instances in a dedicated auto-scaling group and wire it in to the ALB. As the autoscaling group scales up/down, ALB keeps the routing information up-to-date and will forward traffic to new instances as they come up.

Make sure you're aware of the min "catalog" refresh time#

The default interval between querying consul for config updates is 15 seconds. Make sure your service deploys are graceful by allowing enough time between each instance in your rolling deploy.

tip

Waiting for ~30 seconds between each instance update should give Traefik+Consul enough time to propagate membership updates. Adjust delay to account for bootstrap operations to finish too.

Advertizing service routing with Docker+Nomad#

If you're using Nomad, adding service metadata tags makes it easy to declare routing config right next to your service definitions.

Example:

service {
name = "mediamachine.io"
tags = [
"video transcode",
"video thumbnails",
"video summary",
# Enable routing via Traefik
"traefik.enable=true",
# Declare middlewares this service's traffic should go through
"traefik.http.routers.mediamachine.middlewares=mediamachine,mediamachine-headers",
# Easy compression yay!
"traefik.http.middlewares.mediamachine.compress=true",
# Tell Traefik what hostnames to route to this service
"traefik.http.routers.mediamachine.rule=Host(`mediamachine.io`)",
# Which logical entryPoint is attached to this service
"traefik.frontend.entryPoints=httpsIngress",
# Tell Traefik how to ping your service with periodic health checks
"traefik.http.services.mediamachine.loadBalancer.healthCheck.path=/watchadoin",
"traefik.http.services.mediamachine.loadBalancer.healthCheck.interval=6s",
"traefik.http.services.mediamachine.loadBalancer.healthCheck.timeout=500ms"]
}

Nomad takes care of propagating these tags to Consul. Traefik leverages the Consul Catalog integration to periodically fetch updates and dynamically adjusts configuration. As our containers spin up (or down), the membership information is automatically kept up to date.


Improve load times by enabling compression#

This was covered in the previous code example but is worth highlighting. You can easily enable gzip compression via a middleware for all the responses flowing back through it.

Enable compression (declare middleware via TOML)
# Declare gzip compression middleware
[http.middlewares]
[http.middlewares.site-compress.compress]
# Use the middleware with a router
[http.routers]
[http.routers.my-router]
rule = "Path(`/foo`)"
middlewares = ["site-compress"]
service = "service-foo"

For declaring and using compression middleware in-line with consul catalog tags:

Enable compression (declare middleware via consul catalog metadata tags)
# Declare middlewares this service's traffic should go through
"traefik.http.routers.mediamachine.middlewares=mediamachine",
# Easy compression yay!
"traefik.http.middlewares.mediamachine.compress=true",

Protect your servers with ratelimits#

We recommend setting a global ratelimiter to protect your infrastructure from DDOS attacks and accidental for-loops in client side code.

# Here, an average of 100 requests per second is allowed.
# In addition, a burst of 50 requests is allowed.
[http.middlewares]
[http.middlewares.global-ratelimit.rateLimit]
average = 100
burst = 50

This declares a global ratelimit middleware. You can tweak the params of this ratelimiter like the period, burst delay etc dynamically (see next section for live config updates).

You can set a convention to apply the global-ratelimit middleware to all routes and opt-in to more specific ratelimiters whenever needed by specific service backends.


Split Dynamic config strategy (with Consul)#

Traefik config can be divided into two logical sections:

A static config section that doesn't change often (needs a Traefik redeploy)

We define things like where to find Consul, top level config, UI dashboard access etc in the static section. Let's call this file traefik_static_config.toml:

[providers]
[providers.file]
# The contents of this file perform the dynamic configuration
# Since watch is true, updates to dynamic section apply on the fly
filename = "traefik_dynamic_config.toml"
watch = true
[ping] # Enable ping for healthchecks
[api] # Enable api and dashboard, defaults to :8080
dashboard = true
[entryPoints.httpIngress] # Entrypoints are like listeners (or frontends in HAProxy)
address = ":443"
[log]
level = "DEBUG"

Bonus: Identifying unique Traefik instances#

It is sometimes helpful to know which Traefik instance handled a particular request while you're debugging. We found this simple hack to give unique names to our Traefik instances:

[entryPoints]
[entryPoints.{{UNIQUE_ID}}]
address = ":60000" # A static port you're not really using
[entryPoints]
[entryPoints.httpsIngress]
address = ":433" # Bind your services here

You can declare an entrypoint with a unique name that doesn't really route any traffic.


This lets you configure your services to bind with a proper, well-known entrypoint as usual (httpsIngress in this example) while also getting a unique name displayed in the UI dashboard.

[http.middlewares]
[http.middlewares.testHeader.headers]
[http.middlewares.testHeader.headers.customResponseHeaders]
X-This-Traefik-Guy-Is-Called = "{{UNIQUE_ID}}"

Similarly, you can also have a middleware that injects the unique id of the traefik instance into the response headers with every request.

info
Stay updated on new posts. Sign up for our email newsletter.

[FIN]#

We hope this runbook helps you with setting up Traefik in your own environment. If you have other cool tips and tricks, please share them with us and we'd be happy to update this runbook.



Simplify your video pipelineTry MediaMachine today!

  • Get access to one of the cheapest Cloud-Transcode pipelines

  • Engage users early with great Thumbnails and NLP-Like Video summaries

  • No credit card required

Get started for free →