AWS cloud management done right

We set up your AWS account, properly

Anyone can open an AWS account, it's free within certain limits. You can run your own servers, databases, networks, and tap into a range of AWS products from big data analysis to long term object storage, API management to sending bulk email. The AWS management console is famously vast…

Screenshot of the AWS management console menu

This expanse is why our clients often opt for us to manage their environments. Not just their servers; the entire AWS account.

What does 'managed' AWS mean? (Or Azure, Digital Ocean, or… name your poison). This blog explains what we do to protect and manage your cloud infrastructure.

Access management

Be careful about who's allowed into your AWS management console and what they're permitted to do. Full administrative access in the wrong hands could be catastrophic, either accidentally or maliciously. When you hand us an AWS account, we check what users exist, what they can do, and what they need to do. We ensure each person has no more access than needed.

We'll usually do this via a mix of 'IAM roles' (custom sets of permissions for access to different bits of the console) and our SAML Identity Provider, which presents all Code Enigma registered users to AWS and offers access to the correct accounts based on stored information (username, groups, etc.) You login with your Code Enigma user, we tell AWS what your role should be once you're in.

We have preset roles for things like access to billing information, server logs, read-only access to everything and full administrator access, and we can create custom roles for specific clients.

Infrastructure in code

We make sure the configuration of your infrastructure is securely stored, using a product called Terraform; software that lets you define your hardware in code, save that definition and use it to automatically build cloud infrastructure on demand, as well as update and tear down existing infrastructure. It has various 'providers' for different cloud infrastructure options. We predominantly use AWS.

You can also create Terraform 'modules' for common sets of commands, in the same way a programmer might use a class or function to create reusable routines of code. For example, Code Enigma's 'EC2' module uses various calls you can make with the AWS 'provider' in order to achieve the end of a working virtual server (EC2 Instance) in your account. It would be tedious to copy and paste the commands for each EC2 server, so we have a module which creates a server, and we use that module five times to make five servers.

We can 'terraform' anything AWS can do - the documentation is huge! We save and securely store it, so we never lose the configuration of that component of your infrastructure.

A non-exhaustive list of things we typically 'terraform', includes:

Network configurations and VPNs
DNS records
Console and API users
Authentication sources
Virtual servers
Database instances (RDS)
Caching services (e.g. AWS Elasticache)
API gateways
Load balancers
Message services
NAS drives and associated mounts
Logging settings

When we're done setting up a new client we've written quite a lot of Terraform code! We economise wherever possible; we keep standard template sets we can base all layouts on, but there's always customisation. If people only needed a very basic set up, they wouldn't need us!

Change control

It doesn't stop with capturing stuff in code. We stash it and strictly guard it against unexpected change. This is where version control software comes in. Just as we do for our server manifests and for client application code, we keep our Terraform code in a Git repository. We encourage clients to provide a private repository on a Git platform of their choice, to make off-boarding easier (in the unusual event a client wants to leave). If they don't, we'll make one for them on a utility server. We store all the infrastructure code, with access only available to those who need it, typically only our systems administrators and a client technical contact.

We then orchestrate the ability to build and update infrastructure from that Git repository. We don't want people doing this on their workstations, because that would mean sharing secrets across a team of systems administrators, as well as no audit trail for change, so again, it's still possible for change to be unexpected and hidden. Instead we run Terraform itself on a continuous integration server, usually Jenkins, on the client's utility server, which has the necessary API secrets available to it. So when you make changes you have to trigger a build which:

Tells us exactly who made the change
Keeps a history
Notifies everyone in real time

We follow a multi-step deployment process, using Git branches, to ensure change is reviewed by peers and automatically tested before it goes live. Typical process for a systems administrator adding a server would look something like this:

They 'fork' the production branch of the Terraform code to a new branch of their own, make their changes - in this case add a '.tf' file for their server - and create a merge request (or pull request if you're used to GitHub) into the development branch, referencing the ticket where the server was requested in the commit message. A colleague would then review the merge request and either accept it or request a change. Once accepted, the CI will automatically run against the development code and fire a 'terraform plan' command. This makes Terraform execute a dry run against the code and flag any issues. Assuming the 'plan' goes well, another merge request is made from the feature branch to the production branch, reviewed (including the 'plan' output, to ensure Terraform will not do anything unexpected or destructive) and CI runs again, this time manually triggered. Terraform will orchestrate the creation of the new server and a few short minutes later it will be ready for configuration.

And all of these things - the repository, the branches, the CI scripts to control the deployments of infrastructure, the setup of an administrative API user Terraform can masquerade as, and so on - take a significant amount of time to get up and running and tested.

Server orchestration

Using Terraform we can orchestrate the creation of new servers. One of the nice features of most cloud platforms is something called the 'cloud config' file, which is a set of YAML-based instructions to be executed when a new server is first created on the platform. For AWS, Terraform allows you to specify the 'userdata' attached to an instance in the cloud config format. To that end we have a custom script that:

Upgrades the operating system and all software to the latest version
Ensures we are using the mainline Debian repositories for Aptitude
Sets the fully qualified domain name of the server
Installs Puppet (our config management software)
Installs any other orchestration dependencies
Installs the 'awscli' Python package for AWS API calls
Sets up AWS CloudWatch log shipping

Once those steps are done and the server is up and running, it is in a ready state for our server management software to take over and fine-tune the server configuration (another blog in itself).

Backup management

When managing a client's cloud presence, we ensure a good backup strategy. We do this at machine and platform level. We make sure your virtual servers and database instances - anything backed by AWS block storage, EBS - is snapshot backed up. It's faster to restore from a snapshot, should we need to, than to restore from our encrypted offsite backups.

We orchestrate snapshot backups via CI again. You can do this with AWS itself, using CloudWatch, but we prefer to group things under our CI and it has advantages around access control, consistency and visibility. It's easy to order a backup via the AWS API from a CI server, which means it's just as easy to take an extra backup on demand, without having to login to the AWS console, browse, click through a wizard, etc.

Disaster recovery

There are a number of extremely useful features of all the setup and management we do when it comes to disaster recovery, AKA what to do when the poop hits the ventilator! (Aside from generate more poop…)

Having the infrastructure code means we have everything we need, off site, to spin up a new layout at another AWS region. We do some finding and replacing (for example, base image IDs, obtainable availability zones, etc.) but we can more or less Terraform the infrastructure in another AWS region, fairly quickly, should we lose the primary region.

Having snapshots of all the data at rest, which are automatically stored in AWS S3 object storage, opens options too. While S3 namespaces are global, S3 buckets themselves are regional. So it's not enough to stick your backups in S3 and assume they'll be available in other regions - if your region goes down, so will your snapshot store. However, as an additional service we can ship your AWS S3 backups to another predefined 'backup region' so you have a copy of your snapshots too. If you lose your region, we have your infrastructure setup (Terraform code in a Git repository) and your data (shipped snapshots) in your backup region, so we can stand it all up again and give you a new IP address for your website in pretty short order.

So, when you see AWS management set-up fees, these are the things we do to ensure your cloud management is professional, secure and efficient.

Next - Managing servers: A look into what goes into proper server management and where your money goes.

If you'd like to talk to us about AWS, you can here.