How to use Spot to achieve Cost Savings with Stability on Kubernetes

4 years ago   •   6 min read

By Prashant Ghildiyal

Cost saving has always been one of the important objectives for organizations but now it is more important than ever before. Because of uncertainty in the business, earlier motto of growth at all cost has been replaced with responsible growth.

In this post we are going to focus on how you can leverage spot instances of AWS for cost saving in Kubernetes clusters without compromising on stability.

You must be thinking that it is trivial as Kubernetes supports it out of the box, hold your thoughts for a while. I promise you, by the end of this post you will know the best possible solution to use spot instances in Kubernetes clusters using mechanisms provided by Kubernetes.

Spot instances of AWS are usually available at 10% of the cost on-demand instances but their reliability is low. If price of spot instances goes beyond the your bidding price then they will be terminated by AWS within 2 mins. Therefore it is important that we distribute pods of our micro-service judiciously across spot and on-demand instances.

How to handle termination notification and drain resources? is also important to maintain SLA of micro-services but is beyond the scope of this article. We will cover that in a separate article.

How auto-scaling works in Kubernetes?

Before we go into details, lets understand the process of auto-scaling of nodes in Kubernetes cluster.

  • If the kube scheduler is unable to place the pod on any of the nodes, it marks the pod as unschedulable.
  • Cluster autoscaler watches unschedulable pods.
  • If the cluster autoscaler finds an unschedulable pod, it filters and prioritizes node groups to select a node group on which this pod can be scheduled.
  • Cluster autoscaler increases desired instance count in the Autoscaling Group (ASG) of the selected node group.
  • ASG scale nodes based on the scaling strategy.
  • Kube Scheduler filters and prioritizes nodes and schedules the pod on the node with the highest priority score.

Without much further ado let’s start our journey.

Spoiler Alert: First two attempts are failures and third attempt is successful.

Attempt 1

Based on my discussion, this is the second most popular approach to use spot instances with Kubernetes. It goes like this

Based on my discussion, this is the second most popular approach to use spot instances with Kubernetes. It goes like this

If nodes have the right spot-is-to-on demand ratio then pods will automatically have the right ratio.

Kops supports the mixed instance group for AWS since version 1.14. Mixed Instance groups can be used to achieve the right ratio of spot and on demand instances.

Let’s look at a relevant part of a sample instance group configuration.

spec:
  mixedInstancesPolicy:
    onDemandBase: 3
    onDemandAboveBase: 30
    spotAllocationStrategy: capacity-optimized
  nodeLabels:
    lifecycle: Spot

As per the above configuration minimum 3 on demand nodes will be available. For additional requirements, 30% of the nodes will be of on demand type and rest 70% will be of spot type.

For node affinity, following is the relevant portion of the pod spec

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: lifecycle
            operator: In
            values:
            - Spot

Case 1: Scaling not Required

If the cluster has capacity to schedule the pod, kube-scheduler will use its filter and priority algorithms to schedule the pod on the best possible node.

Priority algorithm of kube scheduler doesn’t differentiate between spot and on demand nodes. Therefore, even though the nodes will be in approx 70-is-to-30 ratio for spot-is-to-on demand, distribution of pods across these nodes may not be in this ratio.

Case 2: Scaling Required

If cluster doesn’t have capacity to schedule the pod then cluster autoscaler will increase the count of desired instances in ASG.

ASG will then provision a new node such that the ratio of 70-is-to-30 ratio for spot-is-to-on demand is maintained.

After provisioning, kube scheduler will schedule the pod to the new node, assuming there were no pod evictions in between. So in case of a scaling event, the pod will be assigned to the right kind of node.

Can we do better

We can use inter pod anti affinity for better distribution of pods but it will still not guarantee distribution of 70-is-to-30 unless number of pods is equal to number of nodes.

Outcome: failure

Even though nodes will have the desired spot-is-to-on demand ratio, pods may or may not be spread in this ratio, resulting in unstable services in case of spot node outage.

Attempt 2

This is the most often cited approach to use spot instances in Kubernetes clusters

Use node affinity to control distribution of pods across spot and on demand nodes.

For this to work, at least two node groups are needed, one with spot instances only and other with on demand instances only.

Following are the relevant configurations from the two node groups, this can be done without mixed instance nodegroup also.

For spot

spec:
  mixedInstancesPolicy:
    onDemandBase: 0
    onDemandAboveBase: 0
    spotAllocationStrategy: capacity-optimized
  nodeLabels:
    lifecycle: Spot

Similarly, for on demand

spec:
  mixedInstancesPolicy:
    onDemandBase: 3
    onDemandAboveBase: 100
  nodeLabels:
    lifecycle: OnDemand

Following is the relevant pod spec for node affinity

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 70
        preference:
          matchExpressions:
          - key: lifecycle
            operator: In
            values:
            - Spot
      - weight: 30
        preference:
          matchExpressions:
          - key: lifecycle
            operator: In
            values:
            - OnDemand

What does weight mean?

Weight of 30 and 70 doesn’t mean the scheduler will distribute pods between these two node labels in the ratio of 30-is-to-70.

Scheduler combines weight mentioned in the above spec with the ones it has computed using priority functions and assigns pod to node group with the highest score.

Case 1: Scaling not required

If scaling is not required then it will prefer to schedule on spot instances as it has a weight of 70, though actual placement may depend on the score obtained through various priority functions used by the kube scheduler.

Case 2: Scaling required

In case scaling of nodes is required to schedule the pod, then the cluster autoscaler will filter all node groups and will prioritize eligible nodegroups based on its priority algorithm.

Priority algorithms used by the cluster autoscaler are not the same as the kube scheduler. By default it uses the random algorithm, so it will pick one node group randomly out of eligible node groups. In this case, it will select one node group in random order and increase desired instance count in ASG.

Once ASG has provisioned the node, kube scheduler will assign the pod to this new node.

Can we do better?

No, pod anti-affinity will not help as the ratio of spot-is-to-on demand is not equal to the desired ratio.

Outcome: failure

Neither nodes will not have the desired spot-is-to-on demand ratio nor pods will have the desired ratio. This turns out to be worse than attempt 1.

Attempt 3

This is the least mentioned approach, it uses Pod Topology Spread Contraints, which was introduced in Kubernetes 1.16 and was made beta in 1.18. We will use pod topology spread constraints to control how pods are spread across the spot and on demand instances in the cluster.

Following are the relevant configurations from the two node groups, again this can be done without mixed instance nodegroup also.

For spot

spec:
  mixedInstancesPolicy:
    onDemandBase: 0
    onDemandAboveBase: 0
    spotAllocationStrategy: capacity-optimized
  nodeLabels:
    lifecycle: Spot

Similary, for on demand

 spec:
  mixedInstancesPolicy:
    onDemandBase: 3
    onDemandAboveBase: 100
  nodeLabels:
    lifecycle: OnDemand

Following is the relevant portion of the pod spec, for pod topology constraints

metadata:
  labels:
    app: sample
spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: lifecycle
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
         app: sample

How topologySpreadConstraints works?

Topology Spread Constraints uses node labels to identify topology domain(s) of the node. topologyKey is the key of node labels. Kube scheduler tries to place a balanced number of pods across all unique values of this node label (topologyKey).

In our example topologyKey is lifecycle, which has 2 unique values Spot and OnDemand. Kube Scheduler will place pods across nodes with these two values such that the maximum difference between pod count across these two values cannot be more than maxSkew(1 in this case).

In case this label was missing in any node group then scheduler would have not scheduled pod to that node group.

Important point to note is that maxSkew doesn’t favor any label value against topologyKey. It will skew in either direction based on the availability and priority of nodes. Though it will ensure that unbalance between label values is not more than maxSkew.

whenUnsatisfiable is set to DoNotSchedule, kube scheduler ensures that pod is not scheduled in such a way that maxSkew cannot be maintained.

Case 1: Scaling not required

If scaling is not required, kube-scheduler will filter and prioritize node groups which honour the maxSkew and therefore pods will be scheduled in the desired ratio.

Case 2: Scaling required

When scaling is required, cluster autoscaler will filter node groups which honour the topology contraints and increment desired instance number in related ASG.

After ASG has scaled the instance, kube scheduler will assign the pod to the node and therefore pods will be scheduled in the desired ratio.

What’s the catch?

maxSkew is a number, which means that when we use it with HPA as pods will scale the ratio of spot-is-to-on demand will change.

maxSkew can be on either side, it is possible to have

  1. number of pods on spot = number of pods on ondemand + maxSkew
  2. number of pods on ondemand = number of pods on spot + maxSkew

Which means, for a replica count of 5 and maxSkew of 1, ratio can be 2-is-to-3 or 3-is-to-2 for spot-is-to-on demand nodes. This becomes more unpredictable as value of maxSkew becomes higher.

To achieve higher skew its better to create more buckets with the topologyKey.

Outcome: failure

We cannot get an exact spot-is-to-on demand ratio but we can have a predictable ratio nonetheless.

Complete configuration of samples used in this blog is available in this git repo.

Spread the word

Keep reading