Skip to content

Overview

Introduction

The Kubernetes platform provides a fully-featured Kubernetes container orchestration cluster, with included monitoring, ingress and application dashboards. It can be used for running Kubernetes applications within Azimuth, or custom installations using either Helm Charts or Kustomize to manage Kubernetes manifests.

A kubeconfig file for use with helm or kubectl is provided in platform Details after the platform has been launched.

Launch configuration

Warning

Platforms and their names are visible to all members of the cloud project!

Bug

Azimuth currently incorrectly calculates quota predictions for new clusters. If a cluster doesn't fit within your quota, nodes may fail to create and become stuck in "Provisioning" or "Pending" without warning messages.
Please manually ensure you have enough OpenStack quota for any created clusters, especially Disk Volume and Network Security Groups quota, until this issue is resolved. As an additional issue, Azimuth's Quota tab doesn't include every quota limit. For the time being, check using OpenStack's Web UI (Horizon).

To get started, in the Platforms tab, press the New Platform button, and select Kubernetes.

Or, when prompted to select a Kubernetes cluster when deploying any other platform, one can be quickly made using the green plus button:

Green plus button for deploying a kubernetes cluster

You will then be presented with launch configuration options to fill in:

Kubernetes launch configuration options

Option Explanation
Cluster name A name to identify the Kubernetes cluster platform.
Cluster template The cluster template defines the deployed Kubernetes version. We aim to keep the most recent minor versions. When older ones are removed, the cluster version will be marked Deprecated and can be Upgraded.
Control plane size The size of the cloud instances to run the Kubernetes control plane. The options in this menu are your available flavours in OpenStack, and the number of CPUs and quantity of RAM are displayed for each size.

Node groups

Warning

All Kubernetes clusters must contain a least one node group of Kubernetes worker nodes.
More worker nodes may be needed to deploy more apps, or for some apps to operate especially when they reserve resources. For example, JupyterHub will try to reserve a CPU core for each Jupyter server, hence servers may fail to deploy without enough worker nodes.
To mitigate issues with having too few worker nodes, Autoscaling can be used as described below. If the maximum worker count is reached, this indicates you may need to increase this maximum limit to match load requirements.

Info

If you are creating a Kubernetes cluster to use with applications which require GPU, like HuggingFace LLM, be sure to select a Node Size that contains GPUs.

Option Explanation
Name A name to identify the node group. Names are available inside the Kubernetes cluster as node labels.
Node size The size of the cloud instances that form the node group. The options in this menu are your available flavours in openstack, and the number of CPUs and quantity of RAM are displayed for each size. Cloud sizes may also dictate access to other hardware, such as GPUs or high-speed network interfaces.
Enable autoscaling for this node group? When autoscaling is selected, the amount of cloud instances in the node group will increase when existing resources are not sufficient to run the requested amount of pod resources. As the amount of requested pod resources declines, cloud instances are removed from the node group.
When autoscaling is not selected, the size of the node group remains fixed.
Node Count When autoscaling is selected, the minimum and maximum amount of cloud instances to allow in this node group. When autoscaling is not selected, the fixed amount of cloud instances in this node group.

Cluster addons

Info

By default all cluster addons are enabled, they provide useful information about the state of your Kubernetes cluster and add additional functionality.

Option Explanation
Enable Kubernetes Dashboard? The Kubernetes dashboard will be available from the platforms page.
Enable cluster monitoring? A Grafana instance with pre-configured dashboards for visualizing cluster telemetry will be available from the platforms page.

Advanced Options

Danger

Advanced options are set to reasonable defaults and changing them may result in unexpected behaviour, including Kubernetes clusters failing to deploy. Do not change these unless you know what you are doing!

Option Explanation
Enable auto-healing? If enabled, the cluster will try to remediate unhealthy nodes automatically.
Enable Kubernetes Ingress? Allows the use of Kubernetes Ingress to expose services in the cluster via a load balancer. Requires an external IP for the load balancer to be allocated in OpenStack.
Metrics volume size The size of the openstack volume allocated to store cluster metrics. 10GB is a sensible default. Metrics are retained for 90 days, or until the volume is full, whichever happens first.
Logs volume size As above, but the size of the openstack volume used to store logs. 10GB is a sensible default. Logs are retained for only 72 hours. Unlike metrics, if the volume is full, logs will no longer be recorded and existing volumes may become corrupted.

Info

Your cluster will show as "Reconciling" when it is first deployed, or changes are made. This is Kubernetes terminology, meaning it is being changed in some way to match a specification. When it is fully installed, it will show as "Ready".
When a cluster component shows as "Unhealthy", it means its configuration isn't matching expectations. This could be it has not yet finished installing/changing, or it could be experiencing an error which would need investigating in logs or via kubectl. Running an update/upgrade without changing anything may trigger Kubernetes to automatically reconcile and fix it.

Accessing Deployments

Cluster Deployment

If the Kubernetes Dashboard app was installed, the easiest way to manage a cluster deployment is through the Kubernetes Dashboard link on the Details page for the cluster. Access to this can be managed through the Azimuth Identity Provider.

Alternatively, for advanced users, the Kubeconfig for the cluster can be accessed from the Details page, using the button at the top.

This can be used alongside tools like kubectl or helm for cluster management.

To do so, the Kubeconfig should be downloaded, then the KUBECONFIG environment variable set to point to it. Alternatively, for permanent usage, it can be moved and renamed to a file named config at $HOME/.kube/config. For information on how to use Kubeconfig files, see the kubernetes docs.

Kubernetes App Deployments

Warning

Anyone with access to the project will be able to log into any of the Azimuth created deployments in the project

Kubernetes Applications deployed onto a cluster are exposed by Azimuth's Zenith proxy. They are made available publically under a subdomain.

To access them, navigate to the Platform Details page for the cluster, then click the link under "Cluster services".

If enabled under cluster addons, Monitoring or the Kubernetes Dashboard can also be accessed here.

Patching Deployments

Info

In cases where we require users to patch their Azimuth deployments, we will let users know through the usual STFC Cloud communications channels

Warning

This may cause your deployment to be rebuilt, and may result in data loss. Further testing (and feedback) is required to identify possible causes of data loss.

The STFC Cloud team will periodically push changes to the Azimuth images and deployments. We aim to keep the most recent minor versions of Kubernetes. When older ones are removed, the cluster version will be marked Deprecated. In order to update your deployments, click the orange Upgrade button on your instance details. Machines that need patching, ones that are using deprecated versions, will be outlined in red.

Changes can be made to cluster configurations using the green Update button; for example to make changes to node groups or enable additional addons.

Deleting Deployments

Deployments can be deleted using the red Delete button in Details.

Warning

Kubernetes apps deployed onto it will also be deleted.