Karpenter vs Cluster Autoscaler

Inder Singh
Towards Dev
Published in
6 min readDec 7, 2021

--

One of the benefits of using Kubernetes is that it has the ability to scale your infrastructure dynamically based on user demand. So it provides multiple layers of autoscaling functionality: Pod-based scaling with the Horizontal Pod Autoscaler and the Vertical Pod Autoscaler, as well as node-based with the Cluster Autoscaler.This post will cover only the different cluster level autoscalers available in Kubernetes.

What Is Cluster Autoscaling?

  1. Cluster Autoscaling is a component that increases or decreases the size of a Kubernetes cluster (by adding or removing worker nodes), based on the presence of pending pods and multiple metrics.
  2. If you have too many resources then the Cluster Autoscaler can even remove worker nodes and save you some money. Depending on your needs you can even bring your cluster’s scale to zero, though at least 1 VM will always be running to manage the cluster.

The question then arises, “What is the autoscaler for scaling nodes in kubernetes called?

That was a trick question! There are two cluster autoscaler available if your kubernetes workloads are deployed on aws.

  • Cluster Autoscaler
  • Karpenter

Cluster autoscaler is an industry-adopted, open-source, and vendor-neutral tool and it is a part of kubernetes project, with implementations by most major Kubernetes cloud providers.

Karpenter is a node lifecycle management solution — developed in AWS Labs, OSS, and vendor-neutral.

Before we discuss the difference between CA and karpenter, we need to understand few things about Elastic Kubernetes Service, aka EKS, is Amazon’s implementation of Kubernetes in the AWS cloud.

Kubernetes Architecture

A Kubernetes cluster consists of the components that represent the “control plane” and a set of machines called “nodes”.

The worker nodes host the Pods that are the components of the application workload. The control plane manages the worker nodes and the Pods in the cluster.

So when you create a cluster in EKS:

  • a control plane is created
  • an auto-scaling group is created, min/max/desired numbers set, then EC2s are added into the auto-scaling group, using a pre-built image joining the control plane

There are many other resources that got created with the cluster but for the simplicity i have mentioned only two things here, the control plane and “node group”. In EKS , we don’t really “own” the control plane. Instead, you pay a very little fee (almost nothing compared to your EC2 nodes) and AWS manages it for you.

In EKS, We can have managed worker node group or self managed node group in our eks cluster. A “managed” node group means AWS creates the auto-scaling group and manages it, for example, deciding which AMI to use, what cloud-init script to put into it so that when the EC2 is booted it knows how to and where to join the cluster.On the contrary, a self-managed node group, as the name suggests, is totally managed by yourself, thus giving you all the control over anything.This article isn’t going to focus on self-managed node groups and how it works and how to create one so we are not going to dive too much into details here; just to make sure that we are on the same page and we know there are two types of worker node groups in the case of EKS.

Scaling

You might think: if I use the “managed” node group since it’s managed by AWS, I will get auto-scaling automatically, right?

The logic behind this is intuitive because AWS should know if I have enough resources in my cluster to spin up new pods or not so that it could scale the cluster for me, right? But unfortunately, in real life, this is not the case. Even if you use managed node groups, the cluster can’t scale.

Manual Scaling

Of course, when a situation like this happens, you can always manually scale the cluster, by going to AWS console or using CLI, to update the desired number of nodes in that node group. This isn’t recommended, because:

  • manual means it is prone to mistakes
  • you must find out the auto-scaling group name yourself, meaning another manual task or automated script for that, adding complexity
  • you will have to scale it down when the load is less, meaning you have to manually administrate the cluster, drain the node, then delete the node

Cluster Autoscaler

How It Works

  1. The cluster autoscaler algorithm monitors for pods that remain pending.
  2. The autoscaler requests a newly provisioned node if the pending state is due to insufficient cluster resources
  3. The underlying cloud infrastructure (e.g. AWS, GCP,…) provisions a new node and is detected by Kubernetes.
  4. The Kubernetes scheduler is now able to assign the pending pods to the new node(s).
  5. If the Kubernetes cluster autoscaler still detects pending pods, the process repeats.

Difference between Cluster Autoscaler and Karpenter

Cluster Autoscaler will only scale up or down your managed node groups through Amazon EC2 Auto Scaling Groups.It requires the ability to examine and modify EC2 Auto Scaling Groups.So it watches the node groups. Whenever we add a new node group we have to tell Cluster Autoscaler about it because of the mapping that Cluster Autoscaler is kubernetes native and node group is AWS native.

Karpenter manages each instance directly, without use of additional orchestration mechanisms like node groups. Karpenter looks at the workload (i.e pods) and launches the right instances for the situation.Instance selection decisions are intent-based and driven by the specification of incoming pods, including resource requests and scheduling constraints.

For example if i have 50 pods pending for scheduling , Cluster Autoscaler will do the maths and then tell the ASG to scale 2 or 3 nodes based on the calculations but in case of Karpenter , it can ask for a single large ec2 instance and place all pods on it.

Karpenter

How does Karpenter work?

  1. Observes the pod resource requests of unscheduled pods
  2. Direct provision of Just-in-time capacity of the node. (Groupless Node Autoscaling)
  3. Terminating nodes if outdated
  4. Reallocating the pods in nodes for better resource utilization

Karpenter has two control loops that maximize the availability and efficiency of your cluster.

  1. Allocator — Fast-acting controller ensuring that pods are scheduled as quickly as possible
  2. Reallocator — The Reallocator is a slow-acting cost-sensitive controller that ensures that excess node capacity is reallocated as pods are evicted.

In case of Cluster Autoscaler ,we don’t have direct control over EC2 instances, we control them through Auto Scaling Group. CA just asked ASG to increase or decrease the count of nodes. But Karpenter manages the node directly which enables it to retry in milliseconds instead of minutes when capacity is unavailable.It also allows Karpenter to leverage diverse instance types, availability zones, and purchase options without the creation of hundreds of node groups.

Suppose if we get a heavy workload and it doesn’t fit into the current node group. Then we have to basically create a new node group and tell Cluster Autoscaler about it . But in the case of Karpenter, it will look at the workload and after that ask EC2 service to get a specific instance in a specific zone based on workload and it’s done. So karpenter is more of a native kubernetes workload schedular for how we get nodes.

Cluster autoscaler doesn’t bind pods to the nodes it creates. Instead, it relies on the kube-scheduler to make the same scheduling decision after the node has come online. A node that Karpenter launches has its pods bound immediately. The kubelet doesn’t have to wait for the scheduler or for the node to become ready. It can start preparing the container runtime immediately, including pre-pulling the image. This can shave seconds off of node startup latency.

We can also Deprovisioned our nodes using karpenter by setting time-to-live value.

Conclusion

Karpenter is designed to work alongside existing AWS Capacity providers such as EKS Managed node groups and ECS Auto scaling groups. Customers can use a mixed model of cluster capacity management. Over the long term, it is expected that Karpenter will be leveraged more and more as the premier dynamic cluster node manager.

If you enjoy this article, please give it a like, comment, subscribe, and share it!

--

--