Cloud Setup

Introduction

This covers how we setup your infrastructure on AWS, Google Cloud and Azure. These are the three Cloud Providers that we currently support to run Kubernetes. Further, we use the managed service provided by each of the Cloud Providers. This document covers everything related to how infrastructure is setup within each Cloud, how we create an isolated environment for Compliance and the commonalities between them.

Infrastructure as Code / Terraform

The infrastructure is setup using AuditKube which is a Terraform module to create the entire infrastructure. Terraform is used to create Infrastructure as Code so you don't have to go into the Consoles of the different environments and point and click to build infrastructure.

opsZero setups the infrastructure using Terraform so that it can be built in a repeatable manner. This grants you a couple benefits: it creates an audit trail of changes to your infrastructure so you remain compliant, it allows you test new infrastructure services quickly if you want to add them, it allows you to create completely identical isolated environment across different Cloud environments.

Our Terraform module creates the following across different modules: Kubernetes Cluster, Bastion, VPN Machine, SQL (AWS Aurora, AWS RDS, Google Cloud SQL, Azure Database for PostgreSQL), and Redis (AWS ElasticCache, Google MemoryStore, Azure Redis), VPCs, Security Groups.

We setup a new Virtual Private Clouds (VPCs) that isolate the access in each environment. This is beneficial in that even if you are using an existing Cloud environment the VPC in which Kubernetes is deployed will be isolated from the other networks unless it is opened up via VPC Peering. Also by having everything within one VPC we can create and limit network flows to the required services.

Since Terraform is just code it allows us to check in all changes into Git to create an audit trail. This audit trail and all changes to the infrastructure need to be documented to remain compliant with HIPAA / PCI / SOC2.

The Bastion and VPN are two separate machines that have an external IP. These are how we connect to the Kubernetes cluster as it requires we connect to the VPN and then to the Bastion to have access to the Kubernetes cluster. We use Foxpass for authentication to the Bastion and VPN. Foxpass allows you to use G Suite and Office 365 to grant access to the machines giving a singular place for access.

Terraform needs only be run when we create the infrastructure and when we want to make changes to that infrastructure. The way terraform works is that it creates the infrastructure and generates a statefile when you run `terraform apply`. This file is the state of your infrastructure and should be checked in to Git. Additional runs of `terraform apply` compares this statefile to what exists in your infrastructure and creates, modifies or deletes based on what is in your terraform .tf file and what your statefile shows.

The usual reasons you would run terraform are:

  • Change the number of nodes running in your cluster
  • Change the size of your database
  • Change the size of your redis
  • Add additional services to your infrastructure

Cloud

AWS

The configuration for AWS looks something like this:

We build a completely independent VPC that is locked down. We lock things down by doing the following:

  • Need to use bastion for access. It uses Foxpass for access through G Suite, Microsoft 360, OKTA.
  • Need to use VPN for access to the bastion.
  • Need to use ELB via Ingress to Access Kubernetes Services
  • Additional Logging and Security Updates on Amazon Linux including OSSEC
  • Additional Control Log Flows
  • Node level Encryption

Terraform

Packer

Google Cloud

Terraform

Azure

Terraform

Cloudflare

Foxpass

VPN

Kubernetes

We setup Kubernetes using the managed service provider on each of the Cloud providers. AWS EKS, Google Cloud GCE, Azure AKS. This ensures that we don't need to handle running the master nodes which can create additional operational hurdles. We remove this from the picture as much as possible.

Kubernetes will be running with the following things:

  • Ingress controller to reduce the expense of running multiple LoadBalancers
  • Pod Autoscaling to increase the pod scaling.

Nodes

Nodes can be configured using Terraform. Each of the modules for EKS, GCP, AKS have configuration options for adding and additional nodes. Further, you can specify the size and type of the nodes using the Terraform script as well. This should be the variables min_size and max_size. The amount of nodes that a master does not need to be configured and is handled by the managed service providers.

The way to add additional nodes to the cluster is to increase the min_size of the nodes. This will create additional nodes in the cluster. Note that it may take up to 5 minutes to bring up additional nodes but there is not downtime. You can also do the same by reducing the min_size. This will remove the pods and move them to different nodes. Ensure that your code is idempotent to handle cases where the service may be killed.

The way to increase the size is to modify the terraform script and run terraform apply. This will update the configuration. Further, with Azure and GCP we can specify auto update. With EKS there needs to be a manual process for building the nodes updating and replacing them which is described in the AWS section.

Request Cycle

Pods are a group of containers. They are in the simplest form a group of pods that run on the same node together. The way you specify the pods is through a deployment and how you expose these to the outside world is through a service and ingress. An example of a HTTP request looks like this:

DNS (i.e app.example.com) -> Ingress (Public IP Address/CNAME) -> Kubernetes Service -> Kubernetes Pods

Monitoring

Monitoring is configured through third party services such as Datadog, New Relic, etc. These services will cover what the issues with the pods are and other metrics. The need to be setup separately but all of them provide a Helm chart to install so no additional configuration is needed.

Helm

  • When are the Helm templates used in the build process? How does this

    fire?

  • Are all of the templates run everytime?
  • There are a lot of dynamic Helm files in the project that honestly I

    have no idea what they are doing. Where can we look to see the variables that will be used by these charts?

    • How can we edit this? Why would we edit these?
  • When is Helm charts and templates used on the system?

Ingress

The ingress is in its simplest form a Kubernetes LoadBalancer. Instead of what would traditionally be this:

DNS (i.e app.example.com) -> Kubernetes Service -> Kubernetes Pods

It is the following

DNS (i.e app.example.com) -> Ingress (Public IP Address/CNAME) -> Kubernetes Service -> Kubernetes Pods

To break down the Ingress request cycle even further it is the following:

DNS (i.e app.example.com) -> Ingress [Kubernetes Service -> Kubernetes Pods (Nginx) -> Kubernetes Service -> Kubernetes Pods]

The ingress is just another pod such as Nginx that relays the traffic and as such is just another pod in the system. The ingress is a helm chart and is installed manually with the following script.

The ingress works at the DNS layer so it needs to be passed a Host to work:

curl -k -H "Host: app.example.com" https://a54313f35cb5b11e98bb60231b063008-2077563408.us-west-2.elb.amazonaws.com

By setting the DNS to the above host it will automatically send the correct host that the app is listening on. When using DeployTag it automatically creates a DNS on Cloudflare to point to the correct DNS location.

Ingress is a generic architecture that can allow you to specify different paths to different services. This should be configured as part of the Helm chart that is included into every application. The documentation for this is located here

The ingress controller is run on the default namespace and is configured using this chart. One of the features of DeployTag is the ability to set the subdomain of a ingress correctly. Consider the following.

deploytag --cloud aws \
            --cloud-aws-secret-id <cloud-secrets> \
            dns \
            --cloudflare-zone-id <cloudflare-zone-id> \
            --record '{.Branch}-guest-server-frontend-aws' \
            --record '{.Branch}-guest-server-server-aws'

Pods

Scaling the number of pods is as simple as the following:

kubectl scale -n production --replicas=5 deployments/<name>

This increases the number of processes that are running which will increase the load that can be handled. There should be no downtime for this.

AWS Secret Manager

DeployTag uses AWS Secret Manager as the way to store and retrieve secrets that it populates on deployment. The values in Secret Manager become environment variables.