HPA on Kubernetes Cluster: Setup, Best Practices & Scaling Guide

Start Free

✨

Key Takeaways

1. Kubernetes Horizontal Pod Autoscaler (HPA) automatically adjusts pod counts based on CPU, memory, or custom metrics.

2. Helps maintain application performance during unpredictable traffic spikes.

3. Works with Deployment, ReplicaSet, and StatefulSet workloads.

4. Requires a metrics server to gather real-time resource usage data.

5. Can be combined with VPA and KEDA for advanced autoscaling strategies.

Introdcution

One key benefit of running application containers on Kubernetes is the flexibility K8s provides to scale application pods dynamically. In a microservices setup, not every service needs to scale equally. For example, in an e-commerce app:

Backend services may require more scaling during peak hours.
Frontend services may need fewer resources.

Instead of manually changing replica counts in a Deployment object’s manifest — which is time-consuming and prone to over-/under-provisioning — Kubernetes offers autoscaling mechanisms like HPA to optimize this process.

Schedule a Demo

Types of Kubernetes Autoscaling

Autoscaling in Kubernetes refers to the automatic adjustment of resources based on real-time usage. The main types include:

Horizontal Pod Autoscaler (HPA) - Scales the number of pods for a workload.
Vertical Pod Autoscaler (VPA) - Adjusts CPU/memory allocated to pods.
Cluster Autoscaler - Adds/removes cluster nodes.
Time-based Autoscaling - Schedules scaling at fixed times.

Pro tip: Multiple autoscaling types can be combined for zero downtime and better resource utilization.

What is Kubernetes Horizontal Pod Autoscaler (HPA)?

The Horizontal Pod Autoscaler (HPA) in Kubernetes automatically adjusts the number of pods for Deployments, ReplicaSets, or StatefulSets.

Key Features:

Works on CPU, memory, or custom application metrics.
Supports scale up (when usage exceeds thresholds) and scale down (when underutilized).
Improves application reliability, user experience, and cost savings.

HPA relies on the Kubernetes Metrics Server to track usage. For advanced cases, it can integrate with the Custom Metrics API for non-standard scaling triggers like active users, queue length, or event counts.

Kubernetes HPA Use Cases

Traffic spikes handling - e.g., e-commerce sales events.
Batch processing jobs - autoscaling during high data load periods.
Custom metric scaling - e.g., scaling game servers by active players.
Microservice scaling - independently scaling critical services without affecting others.

Prerequisites for Configuring HPA

Before setting up HPA, ensure:

A running the for easier deployment.

Step-by-Step: Configure HPA on Kubernetes Cluster

Let's take a look at how you can configure the Horizontal Pod Autoscaler on your Kubernetes cluster.

Prerequisites:

Ensure that you have a running Kubernetes Cluster and kubectlconfiguring version 1.2 or later.
Deploy Metrics-Server Monitoring in the cluster to provide metrics via the resource metrics API. You can also install this using Devtron's Helm capabilities.
If you want to make use of custom metrics, your cluster must be able to communicate with the API server providing the custom metrics API.

Below are the steps of how you deploy an application and configuring HPA on Kubernetes Cluster:

Schedule a Demo

1. Deploy an Application using Docker

Here, we are using a custom Docker image based on the php-apache image.
Create a Docker file with the following content:

FROM php:5-apache
ADD index.php /var/www/html/index.php
RUN chmod a+rx index.php

Below is the index.php page, which performs calculations to generate an intensive CPU load.

<?php   $x = 0.0001;

2. Apply Deployment and Service

Then start a deployment running the image and expose it as a service using the following YAML Configuration. Let's say it in a file called php-app.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
        - name: php-apache
          image: k8s.gcr.io/hpa-example
          ports:
            - containerPort: 80
          resources:
            limits:
              cpu: 500m
            requests:
              cpu: 200m

                
---

apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
    - port: 80
  selector:
    run: php-apache

Then, run the following command:

kubectl apply -f php-app.yaml

3. Create the HPA

Now that the server is running, you can create a Kubernetes HPA resource using

You can create a Horizontal Pod Autoscaler using the following kubectl autoscale command:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

You can check the current status of Autoscaler using the command :

kubectl get hpa

4. Test Autoscaling Behavior

Let's simulate a scenario where the application faces an increase in load to see how the Horizontal Pod Autoscaler (HPA) that we created will behave.

Ensure that you run all the following commands in the different Terminal.

First, start the Docker Container

kubectl run --generator=run-pod/v1 -it --rm load-generator --image=busybox

Then, send an infinite loop of queries to the php-apache service

while true;
 do wget -q -O- http://php-apache.default.svc.cluster.local; done

You can check the higher CPU load by executing:

kubectl get hpa

5. Terminate the Load

You can now stop the user load. Switch to the terminal, where you had created the Docker Container with busybox image, and press + C.

You can verify if you have terminated the increased load using the command:

kubectl get hpa

The CPU utilization will be dropped to 0% and Kubernetes HPA will autoscale the number of replicas back down to 1. The autoscaling of replicas may take a few minutes.

Result State:

NAME         REFERENCE                      TARGET    MINPODS   MAXPODS     REPLICAS   AGE
php-apache   Deployment/php-apache/scale   0% / 50%     1         10        1          11

Best Practices for HPA on Kubernetes Cluster

Set realistic CPU/memory thresholds to avoid unnecessary scaling.
Monitor application behavior after scaling to ensure stability.
Combine with Cluster Autoscaler for node-level scaling.
Use custom metrics for domain-specific triggers.
Regularly tune HPA configs based on traffic trends.

Conclusion

The Kubernetes Horizontal Pod Autoscaler (HPA) is a must-have for workloads with unpredictable demand. It keeps applications responsive and cost-efficient by dynamically adjusting resources.
For even greater automation, combine HPA with VPA or KEDA for hybrid scaling strategies.

🚀

Exciting News: Devtron’s upcoming Agentic AI feature will bring intelligent scaling recommendations and self-optimizing Kubernetes deployments. Stay tuned!

FAQ

What Is the Kubernetes Horizontal Pod Autoscaler (HPA)?

The Kubernetes Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pods in a deployment based on resource utilization, such as CPU or memory. This ensures that applications have the necessary resources during high traffic and reduces resource consumption when demand drops.

What Are the Types of Autoscaling in Kubernetes?

Kubernetes offers four main types of autoscaling: Horizontal Pod Autoscaling (HPA), Vertical Pod Autoscaling (VPA), Cluster Scaling, and Time-based autoscaling. These mechanisms help optimize resource allocation, ensuring applications run efficiently and scale based on real-time needs.

How Do You Configure the Horizontal Pod Autoscaler in Kubernetes?

To configure HPA in Kubernetes, deploy an application, ensure the Metrics Server is running, and use the kubectl autoscale command to set the desired CPU utilization and replica range. HPA will automatically scale pods based on resource usage.

What Are Some Use Cases for Kubernetes HPA?

Kubernetes HPA is ideal for handling unpredictable traffic, such as web applications during sales events, batch processing jobs, and scaling based on custom metrics. It ensures resource efficiency by scaling pods up during high load and down when demand decreases

Siddhant Khisty

Siddhant is a Product Executive at Devtron with expertise in Kubernetes, ArgoCD, and DevOps. An organizer for Cloud Native Nashik, he loves open-source tech, automation, and securing deployments.

Anushka Arora

Anushka is the youngest member of team Devtron. She is the force behind the content that one sees on Devtron and loves to share her knowledge with people. She is an avid book reader and loves traveli

Co-authors: Siddhant Khisty, Anushka Arora,

Tags:
Autoscaling

Documentation

Devtron Plugins

Devtron OSS

Release Notes

Join Developer Discord

See the Platform Overview

Watch 3-Minute Demo

Agentic SRE

Join Early Access Waitlist

100+ Integrations

Application Management

Infrastructure Management

Security & Governance

Observability

FinOps & Cost Management

Storage & Backup

Book Enterprise Demo

Install Open Source

VMware Tanzu Migration

Commercial Software Distribution

Kubernetes for Telcos

Telecommunications

Financial Services

Healthcare

Retail & E-commerce

Book Enterprise Demo

Install Open Source

Blog

Videos

Events & Webinars

eBooks

Reviews