Spot Instance Cost Savings in Kubernetes: Strategies, Configurations & Best Practices

Start Free

✨

Key Takeaways

1. Spot instances can reduce cloud costs by up to 90% compared to On-Demand instances, but require careful planning.

2. We compare three Kubernetes strategies for Spot instance cost savings—two popular but flawed, and one more reliable.

3. Pod Topology Spread Constraints can improve cost-to-reliability ratio.

4. Tools like Devtron simplify Spot instance adoption and ensure operational stability.

Cost savings have always been a priority for organizations, but today’s economic climate demands responsible growth over “growth at all costs.” One of the most effective ways to achieve this in cloud-native environments is by leveraging AWS Spot Instances in Kubernetes clusters for substantial cost savings.

Spot instances can slash costs by up to 90% compared to On-Demand. However, their inherent volatility means you need robust scheduling and scaling strategies to avoid service disruptions. In this blog, we’ll explore three Kubernetes approaches to use Spot instances for optimal cost savings, compare their trade-offs, and see how Devtron makes it simpler.

Schedule a Demo →

Understanding Spot Instance Cost Savings in Kubernetes

Before diving into approaches, let’s clarify what “Spot instance cost savings” really means in Kubernetes:

Cost differential: AWS Spot instances are priced dynamically and are often 70–90% cheaper than On-Demand.
Workload suitability: Ideal for fault-tolerant, stateless workloads.
Scaling behavior: The way Kubernetes schedules and scales nodes directly impacts cost efficiency.
Risk mitigation: Strategies like Pod Topology Spread Constraints help maintain reliability when Spot capacity is reclaimed.

How Auto-scaling Works in Kubernetes (Quick Recap)

Attempt 1 — Mixed Instance Groups with Ratio Targeting

This approach aims for a fixed spot-to-on-demand ratio in a mixed instance group. While this sounds ideal for Spot instance cost savings, actual pod placement can deviate from the ratio.

Sample Configuration — Mixed Instance Group for Spot & On-Demand

spec:
  mixedInstancesPolicy:
    onDemandBase: 3
    onDemandAboveBase: 30
    spotAllocationStrategy: capacity-optimized
  nodeLabels:
    lifecycle: Spot

In this setup, 3 On-Demand nodes are guaranteed, and additional nodes maintain a 70% Spot / 30% On-Demand ratio.

Pod Affinity Example

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: lifecycle
            operator: In
            values:
            - Spot

Case 1: Scaling not required
Scheduler uses general priority rules, not cost preference. Result: Spot nodes may be underused, lowering cost savings.

Case 2: Scaling required
Cluster autoscaler maintains the ratio when adding nodes, but pod placement can still drift, impacting savings.

Outcome: ❌ Ratio maintained at node level, but Spot instance cost savings can be inconsistent due to scheduling gaps.

Attempt 2 — Node Affinity for Pod Placement

This method separates node groups for Spot and On-Demand, then uses node affinity to guide pod scheduling.

Spot Node Group Example

spec:
  mixedInstancesPolicy:
    onDemandBase: 0
    onDemandAboveBase: 0
    spotAllocationStrategy: capacity-optimized
  nodeLabels:
    lifecycle: Spot

On-Demand Node Group Example

spec:
  mixedInstancesPolicy:
    onDemandBase: 3
    onDemandAboveBase: 100
  nodeLabels:
    lifecycle: OnDemand

Preferred Node Affinity Example

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 70
        preference:
          matchExpressions:
          - key: lifecycle
            operator: In
            values:
            - Spot
      - weight: 30
        preference:
          matchExpressions:
          - key: lifecycle
            operator: In
            values:
            - OnDemand

Case 1: Scaling not required
Pods often land on Spot nodes (weight 70) — boosting savings — but other scheduler priorities can override, lowering Spot utilization.

Case 2: Scaling required
Autoscaler may pick the On-Demand group randomly, eroding Spot instance cost savings during spikes.

Outcome: ❌ Even less predictable than Attempt 1 for sustained savings.

Attempt 3 — Pod Topology Spread Constraints (Most Reliable)

This approach ensures pods are evenly spread across Spot and On-Demand nodes, using Pod Topology Spread Constraints. It doesn’t lock an exact ratio, but keeps skew predictable.

Spot Node Group

spec:
  mixedInstancesPolicy:
    onDemandBase: 0
    onDemandAboveBase: 0
    spotAllocationStrategy: capacity-optimized
  nodeLabels:
    lifecycle: Spot

On-Demand Node Group

spec:
  mixedInstancesPolicy:
    onDemandBase: 3
    onDemandAboveBase: 100
  nodeLabels:
    lifecycle: OnDemand

Pod Spec with Topology Spread Constraints

metadata:
  labels:
    app: sample
spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: lifecycle
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
         app: sample

Case 1: Scaling not required
Scheduler honors maxSkew, keeping pods balanced. This keeps Spot usage high, preserving cost savings while ensuring reliability.

Case 2: Scaling required
Autoscaler selects node groups honoring constraints, keeping Spot utilization consistent during scaling.

Outcome: ✅ Most predictable for balancing Spot instance cost savings with workload stability.

Best Practices for Maximizing Spot Instance Cost Savings

Use a capacity-optimized allocation strategy for Spot.
Apply PodDisruptionBudgets for controlled failover.
Monitor termination notices and drain nodes proactively.
Pair HPA with topology spread constraints for predictable scaling.
Let Devtron automate instance choice to maximize Spot usage.

Conclusion — Stable Savings at Scale

Spot instances are a powerful tool for reducing Kubernetes compute costs, but only when scheduled and scaled correctly.

Mixed instance groups: Good in theory, inconsistent in practice.
Node affinity: More control, but unreliable during scaling.
Pod Topology Spread Constraints: Best balance between savings and stability.

Devtron simplifies Spot adoption with automation, insights, and best practices built-in.

✨

Coming soon: Devtron’s agentic AI feature will automatically select the most cost-efficient compute option in real time, maximizing Spot instance cost savings without manual oversight.

FAQ

How can I use AWS Spot Instances in Kubernetes for cost savings?

Use Spot Instances by configuring node groups with lifecycle: Spot labels and managing pod scheduling using Kubernetes features like node affinity or topology spread constraints to optimize cost without sacrificing reliability.

What are Pod Topology Spread Constraints in Kubernetes?

Pod Topology Spread Constraints ensure even distribution of pods across node groups using specific node labels. It uses topologyKey, maxSkew, and whenUnsatisfiable fields to limit the difference in pod count between different node types like Spot and OnDemand.

Why do node affinity and mixed instance groups fail to ensure correct Spot instance usage?

Node affinity and mixed instance groups do not guarantee consistent pod distribution across Spot and OnDemand nodes because Kubernetes scheduler and cluster autoscaler use different scoring and prioritization logic, which leads to unpredictable outcomes during scaling.

What’s the best Kubernetes strategy for balancing Spot and OnDemand instances?

Using Pod Topology Spread Constraints with lifecycle node labels and maxSkew provides a balanced and resilient approach to schedule pods across Spot and OnDemand nodes, offering cost efficiency with predictable distribution, especially during scaling events.

Prashant Ghildiyal

Prashant Ghildiyal is the Co-founder of Devtron. He actively contributes to the tech community through open-source projects, webinars, articles, and shares insights on Kubernetes adoption.

Tags:
AWS

Documentation

Devtron Plugins

Devtron OSS

Release Notes

Join Developer Discord

See the Platform Overview

Watch 3-Minute Demo

Agentic SRE

Join Early Access Waitlist

100+ Integrations

Application Management

Infrastructure Management

Security & Governance

Observability

FinOps & Cost Management

Storage & Backup

Book Enterprise Demo

Install Open Source

VMware Tanzu Migration

Commercial Software Distribution

Kubernetes for Telcos

Telecommunications

Financial Services

Healthcare

Retail & E-commerce

Book Enterprise Demo

Install Open Source

Blog

Videos

Events & Webinars

eBooks

Reviews