Using AWS Fargate and GitLab CI to deliver containerized automation and DevOps

In the second of two parts, we look more closely at a complex set-up using AWS Fargate and GitLab CI to deliver containerized automation and DevOps for enhanced developer experience, reduced costs and more robust infrastructure. Part 1 of this blog is here.

Part two

If you've stumbled into this from somewhere else, I'd encourage you to go read part 1 first.

Otherwise, welcome back! Hopefully, you're here because you've read up on building containers, had a go at it with the examples in the previous post, built and pushed the test image provided, you've got a basic proof of concept working at AWS and orchestrated by your GitLab instance and you're happy it all works. Now you want to use this in anger, so what next?

Creating your own container image

The first thing you're going to need to do is pack your own container for your automation. Although you've been kicking the tyres with the provided Debian example container, you've probably already figured out this really isn't useful for anything other than proof of concept "Hello World" stuff. If you think about it, your CI container is going to be sensitive. It's necessarily going to contain private SSH keys needed to connect to target infrastructure for a kickoff, and likely some other secrets too, maybe API credentials, lists of sensitive IP addresses, etc.

We're going to use Docker to build container images, and I'm not going to re-document how you do that, but in terms of pointers I actually started with the GitLab provided Debian container example and used it as a base. Why? Mainly how it handles the entry point - the code executed by the container when it is launched - and how this entry point code allows GitLab Runner to connect to a new container to give it instructions. That and it's a nice, clean, basic Debian install (it so happens all our infra is Debian too!) so I ran with a fork of that as the starting point for my own CI container image. If you don't fancy using the Debian image as a starter then have a hunt around for existing lightweight and open-source container images you could piggyback on, or roll your own entirely if you're feeling brave.

Moving on, because our entire dev stack is built on the Docker platform we have ready code for building and deploying Docker images to our Docker Hub account (a private repo, of course - whatever you do, do not accidentally publish a Docker image full of secrets). I've shared a working example of how you might build a CI container in our dev tools repository, ce-dev. This will be periodically updated as we figure things out and fix things up.

But the short version is you'll need some means, automated or otherwise, of building and maintaining a private Docker container using the `docker image build` and a private repository you can `--push` to and that AWS Fargate will be able to 'pull' your image from later.

The AWS bits

It's worth noting there's no particular reason you couldn't use Kubernetes for orchestrating your containers - Fargate can support either EKS (Kubernetes service) or ECS (Docker service) - but all the documentation provided uses ECS so I've stuck with that.

Docker login

If you use an external repository for your private CI container image then you'll need to provide some credentials using the AWS Secrets Manager so Fargate can log in and pull the image. In our case, we're using Docker Hub so we had to create a basic secret to allow private registry authentication. AWS have solid documentation about secret management and how that ties in with ECS here.

Don't forget that because our container is private you'll need to tick "Private repository authentication" and enter the ARN of your secret as created above when you're making your ECS Task Definition. Also note when you provide your Docker image to the Task Definition you don't need to provide a full Docker Hub URL, Fargate will discover that automatically. You only need to provide image namespace, name and version, e.g. `my-organisation/my-image:latest`.

Fargate permissions

In order to get the information it needs and orchestrate the services it requires, Fargate needs a certain set of authorisations. These are provided via an IAM role which is assigned to your Fargate tasks. You should already be familiar with this having gone through the GitLab documentation, so here I will simply list the additional requirements.

We made an extra IAM policy called ReadSecrets to allow Fargate to access our Docker login, which looks like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetRandomPassword",
                "secretsmanager:GetResourcePolicy",
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret",
                "secretsmanager:ListSecretVersionIds",
                "secretsmanager:ListSecrets"
            ],
            "Resource": "*"
        }
    ]
}

Then we created the role for ECS with these policies attached:

SecretsManagerReadWrite
AmazonS3FullAccess
AmazonECSTaskExecutionRolePolicy
ReadSecrets (our policy)

The Trust Relationship for the role needs to look like this:

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Finally, your server with the GitLab Runner software installed on it - note, the Runner, not GitLab itself, they may or may not be one and the same machine - will need to have an IAM role with the AmazonECS_FullAccess policy.

It's worth noting you should have a Security Group, effectively an external firewall, in front of your CI containers as well and if you do it will need to allow traffic on port 22 from your GitLab Runner server. It shouldn't need anything else for basic operation.

Static IP for containers

One thing none of the reading material touches on is networking the containers. The example sets `EnablePublicIP` to `true` in the runner configuration on the assumption the target machine will allow any IP to connect, but this provides us with some issues. Usually, you'll want some security on your servers so they do not allow connections from any old server. You'll hopefully have AWS Security Groups and hopefully some strict firewall rules as well. You'll hopefully have the SSH port, 22, locked down to a set of trusted addresses.

But if our containers have a randomly assigned public IP address how do we deal with that? Are we going to safelist a whole region of AWS? That feels like a bad idea. Indeed, you see people often asking CI providers to give ranges of IP addresses they use so firewalls can be configured to allow them, but that's still quite bad because anyone use that CI provider can, by inference, also connect to your target infrastructure. This is one of the big strengths of doing our scalable CI this way, we can control the networking!

Unlike the GitLab proof of concept we disable public IPs for CI containers, they really don't need them. You'll remember when setting up your proof of concept you have to specify an AWS subnet ID in the configuration. This is the key if you configure that AWS subnet to use an Internet Gateway (IGW) for outbound traffic and you disable the container's public IP addressing, all traffic from the container - any container in your cluster - will go out via that IGW. Which has a static IP you can safelist. So you have a single IP you can trust across your target infrastructure and all containers will use it. Gone is the problem of having to safelist multiple essentially untrusted ranges of addressing just to let your CI work.

Sharing data

In our case, there is some data all our CI containers need to be able to access. The simplest solution for this is AWS EFS, their Network Attached Storage (NAS) solution. If you need to share disk between containers because you need data to persist between runs, you'll need to mount an EFS volume in your Docker container. Fargate mostly takes care of this for you, but you'll need to configure any Security Groups (SGs) in the mix.

Specifically, if you have an SG in front of your CI containers you'll need to add inbound port 2049 to allow NFS traffic to pass. Similarly, the SG in front of your EFS volume will need to allow NFS traffic from your containers and any other server that might need to access this volume. You'll probably want to allow the entire internal private range of the AWS subnet you configured the runner to use, so any internal IP the container might receive will be able to access EFS.

If you are mounting an EFS volume, then when you create your Task Definition you will need to add port 2049 as well as port 22 under port mappings. You'll also need to use the "Volumes" section to add our EFS volume and then in the container details under "Storage and Logging" you'll need to add a mount point, selecting your configured volume and providing the path on the container it should mount to.

Help!

Hopefully, you've gained enough background and pointers from this blog to be dangerous! Good luck setting up your custom CI, it's a fiddly process but rewarding once working, as you watch your AWS bill go down and your developers revel in their time-saving parallel CI pipelines!

We can do pay as you go support by the half-hour as well as longer-term contracts and full-service management.