1. Spot instances can reduce cloud costs by up to 90% compared to On-Demand instances, but require careful planning.
2. We compare three Kubernetes strategies for Spot instance cost savings—two popular but flawed, and one more reliable.
3. Pod Topology Spread Constraints can improve cost-to-reliability ratio.
4. Tools like Devtron simplify Spot instance adoption and ensure operational stability.
Cost savings have always been a priority for organizations, but today’s economic climate demands responsible growth over “growth at all costs.” One of the most effective ways to achieve this in cloud-native environments is by leveraging AWS Spot Instances in Kubernetes clusters for substantial cost savings.
Spot instances can slash costs by up to 90% compared to On-Demand. However, their inherent volatility means you need robust scheduling and scaling strategies to avoid service disruptions. In this blog, we’ll explore three Kubernetes approaches to use Spot instances for optimal cost savings, compare their trade-offs, and see how Devtron makes it simpler.
Understanding Spot Instance Cost Savings in Kubernetes
Before diving into approaches, let’s clarify what “Spot instance cost savings” really means in Kubernetes:
- Cost differential: AWS Spot instances are priced dynamically and are often 70–90% cheaper than On-Demand.
- Workload suitability: Ideal for fault-tolerant, stateless workloads.
- Scaling behavior: The way Kubernetes schedules and scales nodes directly impacts cost efficiency.
- Risk mitigation: Strategies like Pod Topology Spread Constraints help maintain reliability when Spot capacity is reclaimed.
How Auto-scaling Works in Kubernetes (Quick Recap)
Attempt 1 — Mixed Instance Groups with Ratio Targeting
This approach aims for a fixed spot-to-on-demand ratio in a mixed instance group. While this sounds ideal for Spot instance cost savings, actual pod placement can deviate from the ratio.
Sample Configuration — Mixed Instance Group for Spot & On-Demand
spec:
mixedInstancesPolicy:
onDemandBase: 3
onDemandAboveBase: 30
spotAllocationStrategy: capacity-optimized
nodeLabels:
lifecycle: Spot
In this setup, 3 On-Demand nodes are guaranteed, and additional nodes maintain a 70% Spot / 30% On-Demand ratio.
Pod Affinity Example
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: lifecycle
operator: In
values:
- Spot
Case 1: Scaling not required
Scheduler uses general priority rules, not cost preference. Result: Spot nodes may be underused, lowering cost savings.
Case 2: Scaling required
Cluster autoscaler maintains the ratio when adding nodes, but pod placement can still drift, impacting savings.
Outcome: ❌ Ratio maintained at node level, but Spot instance cost savings can be inconsistent due to scheduling gaps.
Attempt 2 — Node Affinity for Pod Placement
This method separates node groups for Spot and On-Demand, then uses node affinity to guide pod scheduling.
Spot Node Group Example
spec:
mixedInstancesPolicy:
onDemandBase: 0
onDemandAboveBase: 0
spotAllocationStrategy: capacity-optimized
nodeLabels:
lifecycle: Spot
On-Demand Node Group Example
spec:
mixedInstancesPolicy:
onDemandBase: 3
onDemandAboveBase: 100
nodeLabels:
lifecycle: OnDemand
Preferred Node Affinity Example
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 70
preference:
matchExpressions:
- key: lifecycle
operator: In
values:
- Spot
- weight: 30
preference:
matchExpressions:
- key: lifecycle
operator: In
values:
- OnDemand
Case 1: Scaling not required
Pods often land on Spot nodes (weight 70) — boosting savings — but other scheduler priorities can override, lowering Spot utilization.
Case 2: Scaling required
Autoscaler may pick the On-Demand group randomly, eroding Spot instance cost savings during spikes.
Outcome: ❌ Even less predictable than Attempt 1 for sustained savings.
Attempt 3 — Pod Topology Spread Constraints (Most Reliable)
This approach ensures pods are evenly spread across Spot and On-Demand nodes, using Pod Topology Spread Constraints. It doesn’t lock an exact ratio, but keeps skew predictable.
Spot Node Group
spec:
mixedInstancesPolicy:
onDemandBase: 0
onDemandAboveBase: 0
spotAllocationStrategy: capacity-optimized
nodeLabels:
lifecycle: Spot
On-Demand Node Group
spec:
mixedInstancesPolicy:
onDemandBase: 3
onDemandAboveBase: 100
nodeLabels:
lifecycle: OnDemand
Pod Spec with Topology Spread Constraints
metadata:
labels:
app: sample
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: lifecycle
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: sample
Case 1: Scaling not required
Scheduler honors maxSkew, keeping pods balanced. This keeps Spot usage high, preserving cost savings while ensuring reliability.
Case 2: Scaling required
Autoscaler selects node groups honoring constraints, keeping Spot utilization consistent during scaling.
Outcome: ✅ Most predictable for balancing Spot instance cost savings with workload stability.
Best Practices for Maximizing Spot Instance Cost Savings
- Use a capacity-optimized allocation strategy for Spot.
- Apply PodDisruptionBudgets for controlled failover.
- Monitor termination notices and drain nodes proactively.
- Pair HPA with topology spread constraints for predictable scaling.
- Let Devtron automate instance choice to maximize Spot usage.
Conclusion — Stable Savings at Scale
Spot instances are a powerful tool for reducing Kubernetes compute costs, but only when scheduled and scaled correctly.
- Mixed instance groups: Good in theory, inconsistent in practice.
- Node affinity: More control, but unreliable during scaling.
- Pod Topology Spread Constraints: Best balance between savings and stability.
Devtron simplifies Spot adoption with automation, insights, and best practices built-in.
FAQ
How can I use AWS Spot Instances in Kubernetes for cost savings?
Use Spot Instances by configuring node groups with lifecycle: Spot labels and managing pod scheduling using Kubernetes features like node affinity or topology spread constraints to optimize cost without sacrificing reliability.
What are Pod Topology Spread Constraints in Kubernetes?
Pod Topology Spread Constraints ensure even distribution of pods across node groups using specific node labels. It uses topologyKey, maxSkew, and whenUnsatisfiable fields to limit the difference in pod count between different node types like Spot and OnDemand.
Why do node affinity and mixed instance groups fail to ensure correct Spot instance usage?
Node affinity and mixed instance groups do not guarantee consistent pod distribution across Spot and OnDemand nodes because Kubernetes scheduler and cluster autoscaler use different scoring and prioritization logic, which leads to unpredictable outcomes during scaling.
What’s the best Kubernetes strategy for balancing Spot and OnDemand instances?
Using Pod Topology Spread Constraints with lifecycle node labels and maxSkew provides a balanced and resilient approach to schedule pods across Spot and OnDemand nodes, offering cost efficiency with predictable distribution, especially during scaling events.