Unlock yourself from Cluster AutoScaler Quota Limits

Tomer Shaiman
6 min readMar 25, 2023

--

Photo by Growtika on Unsplash

For quite some time, I’ve been struggling with the following challenge: How can I ensure that my Cluster Auto Scaler behavior meet the quota allocated to my subscription, and more importantly, how can I detect that in advance and not during a live event in production?

This Article will share some light on the subject while providing a remediation advice for the Azure Kubernetes Service (AKS) Scenario.

What Is Cluster Auto Scaler?

A cluster auto scaler is a supply-demand mechanism. It supposed to monitor those un-scheduled pods on your workload ( for example, because of resource constrains) and increase the number of Nodes in your cluster accordingly.
In AKS this is done due to the fact that node Pools are backed by VMSS (Virtual Machine Scale Set) that are being configured to the desired scale definition.

In short: pods are increased due to the HPA (Horizontal Pod Autoscaler) mechanims, while nodes increase due to the Cluster Autoscaler mechanism

Note: For more information how the AKS Cluster auto scaler can be configured, visit Use the cluster auto scaler in Azure Kubernetes Service (AKS) — Azure Kubernetes Service | Microsoft Learn

Problem Definition

As the cloud resources are not unlimited and are determined by the capacity allocated to our cloud subscription, the region we are running on and the availability of the global compute expansion of the cloud, it is only a matter of time until our production workload will someday hit that quota limit.

Our problem then become more prominent: how can we detect we are running “out-of-capacity” when predicting workload usage is sometimes not possible? (Burstable Workloads, Global resource availability and so on).

Our next section will simulate a case where we will deliberately get ourselves into “out-of-compute-resource” capacity on the cloud so we can see what the observability are means we have to get notified and react on such scenarios.

Simulating Out-Of-Quota Scenario

we will now simulate a case where we ran into the out-of-capacity scenario by following the following steps:

  1. Add Node Pool that represent a very limited capacity compute SKU on our Azure Subscription
  2. Attach Log Analytics to our AKS Cluster
  3. Create a workload on the newly created node pool to ensure our supply-demand Does Not meet.
  4. View and investigate the logs generate on the Log Analytics table and create Alerts from the log tables.

Step 1 : Adding AKS Node Pool that has low-quota on your subscription

Head to “Azure Quota” tab and look for low-capacity resource. for example, under my subscription I found that I have only 100 cores left on my DSV4 VM family type:

Quotas Tab On Azure

We can now create the node pool from that family type. we shall create a Node Pool of Size D48s_v4 with scale definition of 0–3.
since we know we only have 100 cores left on this SKU type, we set ourself on the fast lane to failing getting the needed capacity when the workload increases.

run the following Az-CLI command to create the node pool:

 az aks nodepool add \
--resource-group $rg \
--cluster-name $aks \
--name low \
--node-vm-size Standard_D48s_v4 \
--enable-cluster-autoscaler \
--labels workload=loc-cap \
--min-count 1 \
--max-count 3

we can also do the same on the Azure Portal. this time the portal hints us we are going to run with out-of-capacity scenarios with the given scale definitions.

Note: we have set node labels workload=loc-cap so we can later on select our workload to run on this node pool.

Step 2: Configure Log Analytics Settings for the Cluster

Head to the Monitor Blade of the AKS Cluster and Click on Diagnostic Settings:

From the Edit Settings, turn on the Cluster Auto Sclaer option and ensure you redirect the output to some log analytics account.

Step 3: Deploy Workload

We will now deploy some dummy load on the cluster that will select the low-capacity node pool we have created in step 1. as we have only 100 cores available on the subscription for the designated node pool, a replica of 5 pods, each with requests of 32 cores, will do the trick.
create a file called deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 5
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
nodeSelector:
workload: low-cap
containers:
- name: nginx
image: nginx:latest
resources:
requests:
cpu: 32

note the nodeSelector definition that ensure our workload will be assigned to the low-capacity node pool.

$ Kubectl apply -f deployment.yaml

> deployment.apps/nginx-deployment created
spinning up workload that will never see day light.

after few minutes we can verify that only few of the pods got actually created, as we ran out of capacity pretty quickly here.

its time to see how the cluster auto scaler logs reflect that.

Step 4: Analyze Auto Scaler Logs

There are several places where we can see what is going on under the hood in terms of auto scaler errors.

  1. On the pending Pod event log:
$ kubectl describe pod nginx-deployment-86989459dc-2jpfp

> Normal NotTriggerScaleUp 79s (x28 over 6m16s) cluster-autoscaler pod didn't trigger scale-up: 6 node(s) didn't match Pod's node affinity/selector, 1 in backoff after failed scale-up

2. a dedicated config map is reflecting the cluster auto scaler health and events in the kube-system namespace:

$  kubectl get cm cluster-autoscaler-status -o yaml -n kube-system

> Health: Healthy (ready=1 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=1 (minSize=1, maxSize=3))
LastProbeTime: 2023-03-25 11:42:13.776674209 +0000 UTC m=+726.015159386
LastTransitionTime: 2023-03-25 11:30:18.298428343 +0000 UTC m=+10.536913520
ScaleUp: Backoff (ready=1 cloudProviderTarget=1)
LastProbeTime: 2023-03-25 11:42:13.776674209 +0000 UTC m=+726.015159386
LastTransitionTime: 2023-03-25 11:31:31.11550756 +0000 UTC m=+83.353992837

as you can see the ScaleUp is in Backoff status. I would also would have expected the Health status to become “unhealthy” in such case, but I haven’t managed to fully understand what the scenarios are this status becomes unhealthy. Stay tune for updates on this Post as I will take it with the AKS Team soon……

3. Using our Log Analytics AzureDiagnostic table logs, we can write the following query :

AzureDiagnostics
| where Category == "cluster-autoscaler"
| where log_s contains "backoff"
| order by TimeGenerated desc

The advantage here is that we can create an Alert out of this query and get notified whenever the cluster autoscaler is in unhealthy or Backoff state.

Summary

I hope the article helped you gain more confidence in diving deeper into the internal of how the cluster auto-sclaer works in order to get notified whenever it is malfunctioning.
understanding the cases where you need to increase your quota requests and/or redirect traffic to available computer resources, is crucial step in making your system stable, and your customers happy.

--

--

Tomer Shaiman

Principal Software Engineer at Microsoft ✰ Kubernetes and DevOps ✰ CKAD/CKA Certified