Canary Endpoints:
const ( BlueService SelectorName = "blue" GreenService SelectorName = "green" InitialCanaryService SelectorName = "blue" )
Discovered during QA that we did not have the ability to change the active canary service. I have added it:
Before:
iiq-wp-platform git:(dev) kc get configmap -n wp-platform -o yaml apiVersion: v1 items: - apiVersion: v1 data: canaryEnv: '{"selector":"blue","image":"961406424767.dkr.ecr.us-west-2.amazonaws.com/rezfusion-cloud:dev"}' kind: ConfigMap ...
After request:
POST to /services { "selector": "green" }
➜ iiq-wp-platform git:(dev) kc get configmap -n wp-platform -o yaml apiVersion: v1 items: - apiVersion: v1 data: canaryEnv: '{"selector":"green","image":"961406424767.dkr.ecr.us-west-2.amazonaws.com/rezfusion-cloud:dev"}' kind: ConfigMap
Then simply update the image for the active canary service:
POST /services { "imageTag": "216c0204fa1e71f93603c0d5087ef16d6b2ba5bce9084874bf9b2aebcddebc77" }
➜ iiq-wp-platform git:(dev) kc describe -n wp-platform deployment/green-deployment Name: green-deployment Namespace: wp-platform CreationTimestamp: Wed, 05 Apr 2023 15:06:04 -0600 Labels: app=green Annotations: deployment.kubernetes.io/revision: 63 Selector: app=green Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=green Annotations: kubectl.kubernetes.io/restartedAt: 2023-04-11T05:00:45-06:00 Containers: web: Image: 961406424767.dkr.ecr.us-west-2.amazonaws.com/rezfusion-cloud@sha256:1274e8bc4d963536e88265781170e72a2939caae25127cd9d400ab124393f946
Reverting Bad Deployments/Rollbacks
Example of broken deployment:
Once a healthy image hash is identified, we can simply update the canary deployment (or update the image on production if that one has gone bad) and the pods are restarted with the fresh code changes. The benefit of using dynamic image hashes during deployment is that our deployments themselves don’t lose historical context every rollout.
Once a healthy version of the WP app finishes building, it is ready to be deployed.
The pods, as well as any pipeline update hooks (like cache clearing, etc.) will run automatically.
➜ iiq-wp-platform git:(dev) kc describe -n wp-platform deployment/green-deployment Name: green-deployment Namespace: wp-platform CreationTimestamp: Wed, 05 Apr 2023 15:06:04 -0600 Labels: app=green Annotations: deployment.kubernetes.io/revision: 64 Selector: app=green Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=green Annotations: kubectl.kubernetes.io/restartedAt: 2023-04-11T05:00:45-06:00 Containers: web: Image: 961406424767.dkr.ecr.us-west-2.amazonaws.com/rezfusion-cloud@sha256:216c0204fa1e71f93603c0d5087ef16d6b2ba5bce9084874bf9b2aebcddebc77
Termination
Terminating a site is done by sending a request to /sites/{site-id}
with the DELETE action specified:
This will delete the relevant AWS Secrets entries, database related to the site, S3 site objects (aka site files) and removes the ingress entry for the site. This removes the site and all related data.
Create
Creating a site creates the bare-minimum configuration to represent a site:
POST /sites/create { "id": "ggg", "hostnames": ["ggg.cloud2-stg.rezfusion.com"], "service": "blue", "name": "Project Bluelaunch | GGG", "canonicalHostname": "ggg.cloud2-stg.rezfusion.com" }
This will create the AWS Dynamo DB entry for the site as well as relevant secrets for accessing the database and WP CMS.
Provision
After creating a site, it is able to be provisioned. Provisioning a site means that an actual site instance, along with relevant databases created, S3 bucket subdirectories, etc. for the site to use. This endpoint kicks off the installation of a WP site, a job to add an Ingress entry, and a job to activate the desired theme for a given site.
POST /sites/{site-id}/provision
Assuming all goes well, a site should be visible at the hostname configured during /sites/create
within 5-10 minutes.
Misc. Fixes
Cache clearing jobs were missing the
--allow-root
param.