Serving an ECS service via both ALB and NLB


Intro

One of our clients has a firewall that allows whitelisting only IPv4 (no hostnames). For that reason, our API has to be accessible via list of static ip address. The service itself is a classic ECS service. Each container is a member of a target group. That target group is assigned to an ALB. When using Application Load Balancer, the hostname is given. Hostname is static and doesn’t change over time. But we have no control over to which IPv4s it gets resolved. The challange here is: “How to identify the service via a list of static IP addresses?” This problem is tackled by AWS on this blog post, but there is a fine print here. The ALB itself has to be internal one - which is the opposite what we are running.

Architecture

Outline:

  • Create a nginx PROXY based on ecs and nginx docker container.
  • API can be accessed via ALB with all its advantages.
  • Created NLB exposes created Elastic IPs (public).
  • NLB passes traffic to the target group, which points to ECS service that runs nginx proxy.
  • Nginx server uses PROXY_PASS derective to pass traffic to public ALB.

architecture

Code

All code for this post is here

Docker image

This is the core that we can test on localhost.

To do this, go to this folder on your machine and run the docker-compose up. The expected output is:

nginx-proxy-forwarder_1  | Requests will be forwarder to: google.com
nginx-proxy-forwarder_1  | Config:
nginx-proxy-forwarder_1  | *******************************
nginx-proxy-forwarder_1  | server {
nginx-proxy-forwarder_1  |     error_log /dev/stdout debug;
nginx-proxy-forwarder_1  |     listen 8080;
nginx-proxy-forwarder_1  |     location / {
nginx-proxy-forwarder_1  |         proxy_pass https://google.com;
nginx-proxy-forwarder_1  |     }
nginx-proxy-forwarder_1  | }
nginx-proxy-forwarder_1  | *******************************

The container points to google.com, try it:

$ curl -I localhost:9003
HTTP/1.1 301 Moved Permanently
Server: nginx/1.14.2
Date: Sun, 19 May 2019 20:46:00 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 220
Connection: keep-alive
Location: https://www.google.com/
Expires: Tue, 18 Jun 2019 20:46:00 GMT
Cache-Control: public, max-age=2592000
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Alt-Svc: quic=":443"; ma=2592000; v="46,44,43,39"

Expected logs:

 172.28.0.1 - - [19/May/2019:20:46:00 +0000] "HEAD / HTTP/1.1" 301 0 "-" "curl/7.54.0" "-"
 2019/05/19 20:46:00 [info] 12#12: *7 client 172.28.0.1 closed keepalive connection

It is fairly simple: config + Dockerfile. First snippet is the contents of the Dockerfile and the second is the run.sh script.

Note that the nginx config is generated by the run.sh, meaning it gets generated each time container boots up. The goal is to be able to configure nginx based on API_ENDPOINT environment variable.

FROM nginx:stable-alpine
RUN mkdir /nginx/
COPY run.sh /nginx/
ENTRYPOINT ["/nginx/run.sh"]
#!/bin/sh
set -e

cat <<EOF > /etc/nginx/conf.d/passthrough.conf
server {
    error_log /dev/stdout debug;
    listen 8080;
    location / {
        proxy_pass https://$API_ENDPOINT;
    }
}
EOF

echo "Requests will be forwarder to: $API_ENDPOINT"

echo "Config:"
echo "*******************************"
cat /etc/nginx/conf.d/passthrough.conf
echo "*******************************"
nginx -g "daemon off;"

Terraform module

If you don’t know terraform you can find a gentle introduction here. It is a tool for defining architecture as code. You define a desired state of your architecture and terraform tries to achieve it. This code is for educational purpose, clone it a refine it for your needs. And give me maybe a star on github ;)

Proxy ECS Service

ecs-service

To pass the traffic to the target service, provision the container above as a simple ecs service. The entire ECS part is on github. I want to focus on 2 key issues here.

  • The target domain you want to proxy to is defined by the variable target_service_domain_name.
    data "template_file" "ecs_task_container_definitions" {
    template = "${file("${path.module}/container-definition.json")}"
    
    vars {
      container_name = "${local.service_name}"
    
      image          = "${var.docker_image}"
      version        = "${var.docker_image_version}"
      cpu            = "${var.ecs_cpu}"
      memory         = "${var.ecs_memory}"
      container_port = "${local.container_port}"
      api_endpoint   = "${var.target_service_domain_name}:443"
    }
    }
    

    the container-definition.json

    [
    {
      "name": "${container_name}",
      "image": "${image}:${version}",
      "cpu": ${cpu},
      "memoryReservation": ${memory},
      "essential": true,
      "portMappings": [{
          "containerPort": ${container_port},
          "protocol": "tcp"
      }],
      "environment" : [
          { "name" : "API_ENDPOINT", "value" : "${api_endpoint}" }
      ]
    }]}
    ]
    
  • the ecs task must run in awsvpc network mode. This is required to attach the task to the NLB target group later.
    resource "aws_ecs_task_definition" "task" {
    family                = "${local.service_name}"
    container_definitions = "${data.template_file.ecs_task_container_definitions.rendered}"
    task_role_arn         = "${aws_iam_role.ecs_task.arn}"
    # each task gets its own net interface
    network_mode          = "awsvpc"
    }
    

Networking

nlb-listener

Click on this, to get the full code for the module

This part is a bit more trickier, we need:

  • Networking Load Balancer
    resource "aws_lb" "nlb" {
    name         = "${var.environment}-${local.service_name}"
    internal     = false
    idle_timeout = "60"
    
    load_balancer_type         = "network"
    enable_deletion_protection = false
    
    subnet_mapping = [
      {
        subnet_id = "${var.lb_subnet_ids[0]}",
        allocation_id = "${aws_eip.lb1.id}"
      },
      {
        subnet_id = "${var.lb_subnet_ids[1]}",
        allocation_id = "${aws_eip.lb2.id}"
      },
    ]
    
    enable_cross_zone_load_balancing = true
    ip_address_type                  = "ipv4"
    }
    
  • 2 Elastic IPs as endpoints for the above LB
    # dont use count due to the bug: https://github.com/hashicorp/terraform/issues/4944
    resource "aws_eip" "lb1" { vpc=true }
    resource "aws_eip" "lb2" { vpc=true }
    
  • Target group of type IP
    resource "aws_lb_target_group" "tg_for_nlb" {
    name        = "${var.environment}-${local.service_name}-tg-for-nlb"
    protocol    = "TCP"
    port        = "${local.container_port}"
    vpc_id      = "${var.vpc_id}"
    target_type = "ip"
    
    health_check {
      port = "${local.container_port}"
      interval = 10
    }
    
    stickiness {
      type    = "lb_cookie"
      enabled = false
    }
    }
    
  • TLS load balancer listener
    resource "aws_lb_listener" "nlb_listener" {
    
    load_balancer_arn = "${module.nlb.nlb_arn}"
    port              = "443"
    protocol          = "TLS"
    ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
    certificate_arn   = "${data.aws_acm_certificate.cert.arn}"
    
    default_action {
      target_group_arn = "${aws_lb_target_group.tg_for_nlb.arn}"
      type             = "forward"
    }
    }
    
  • Route 53 record and ACM certificate
resource "aws_route53_record" "domain_to_eips" {
  zone_id = "${var.hosted_zone_id}"
  name    = "${var.domain_name}"
  type    = "A"
  ttl     = "300"
  records = [
    "${aws_eip.lb1.public_ip}",
    "${aws_eip.lb2.public_ip}",
  ]
}

data "aws_acm_certificate" "cert" {
  domain   = "*.${var.domain_name}"
  statuses = ["ISSUED"]
}

Conclusion

It is indeed a lot.

Provisioning it by hand would take around an hour to get right. Terraform can do it not only faster, but also give you a sense of control over all these resources.