Kubernetes GPU Challenges: What Platform Teams Face

Start Free

It starts with one request

"We need GPUs. We're training a model. Can you provision it this week?"

You've heard this. Maybe last quarter. Maybe last week. You provision a GPU node pool, hand over access, and close the ticket.

Then another team asks. Then the data science org. Then two LLM initiatives from leadership with an exec sponsor.

Suddenly, you're not running apps on Kubernetes anymore. You're running an AI platform. And the tooling you've built wasn't designed for this.

Three types of people now depend on your cluster

When AI workloads land on enterprise Kubernetes, they bring three very different users, each pulling the same scarce hardware in different directions:

Role	Who they are	What they need from you
Builder	ML Researcher / Data Scientist	Large GPU nodes for training runs that last hours. Hates waiting with no visibility into queue position.
Consumer	ML Engineer / LLM App Dev	Stable, always-on GPU allocation for inference endpoints. Cannot tolerate preemption or surprise restarts.
Owner — you	Platform / DevOps / SRE	GPU utilisation, team quotas, cost control, node health, and fair allocation across everyone above.

You sit in the middle. Everyone else's productivity depends on how well you manage the GPU pool. And right now, most platform teams are managing it with YAML, spreadsheets, and a lot of Slack messages.

What actually happens inside most companies

It usually follows the same arc:

One data science team gets access to GPUs. It works fine.
A second team joins the cluster. You manually create a namespace and set some ResourceQuotas.
A third team asks for A100s. You realise you have T4s and H100s on separate clusters with no shared quota context.
A training job preempts a live inference endpoint. You find out when an SLA breaks.
Leadership asks for a GPU utilisation report. You open a spreadsheet.

This is not a hypothetical. It's the sequence we've seen play out across teams building AI infrastructure on Kubernetes today.

The problems that emerge at scale

These aren't edge cases. They're the exact friction points that appear once more than one team shares GPU infrastructure:

No fair scheduling between teams: Vanilla Kubernetes has no native fairness mechanism for GPU workloads. The team that submits first consumes everything. Everyone else waits with no ETA, no visibility, no recourse.

Training jobs silently preempt inference: A long-running PyTorchJob can displace a live inference endpoint if priorities aren't carefully configured. You find out when users report errors, not before.

GPU idle time is invisible: A team reserves 8 A100s for a job that finishes in 2 hours but holds the nodes for 12. That's tens of thousands of dollars in GPU capacity sitting idle, with no visibility into who holds it and no mechanism to reclaim it.

Multi-cluster quotas don't exist out of the box: Your T4 cluster and your H100 cluster have no shared quota context. Teams learn to game the system, submit to whichever cluster has headroom. You lose cost visibility entirely.

Onboarding a new AI team is a 3-day ops task: Creating namespaces, configuring Kubeflow profiles, setting up Kueue ClusterQueues and LocalQueues, wiring RBAC, none of it is automated. You do it by hand every time.

Kueue helps with scheduling. Kubeflow handles jobs. KServe manages inference. But nothing ties them together with team context, quota governance, and operator visibility. That gap is what platform teams are stitching together by hand.

What platform teams actually end up doing

In the absence of purpose-built tooling, we've seen platform engineers resort to:

Fielding GPU allocation requests over Slack and manually editing quota configs
Tracking GPU usage across teams in a shared spreadsheet
Writing custom scripts to reconcile Kubeflow job states with Kueue queues
Building internal dashboards to surface utilisation that their existing tools don't expose
Creating namespaces and Kubeflow profiles by hand each time a new team onboards

This works until it doesn't. And it stops working faster than you expect.

What we're building at Devtron

We're extending the Devtron platform to cover the full GPU workload lifecycle on Kubernetes, from notebook-based experimentation to production inference, with enterprise-grade controls layered on top.

Built on Kubeflow and Kueue under the hood. Surfaced through the Devtron experience your team already uses. No new stack to adopt.

devtron-gpu-platform — Devtron GPU Platform

Stage	Capability	Outcome for platform teams
Training	Notebooks + Job Submission	Spin up GPU-backed notebooks, define DAG pipelines, submit PyTorchJobs — no manual Kubeflow YAML.
Inferencing	One-click model deployment	Deploy model artifacts to vLLM or KServe endpoints directly from the Devtron dashboard.
Governance	Team Quotas via Kueue	Define GPU budgets per team, enforce fair scheduling, surface real-time utilisation.
Coming next	Multi-cluster + RBAC	Unified quotas across T4, A100, H100 clusters. Role-based access for researchers, engineers, operators.

This is not a rip-and-replace. It extends the Kubernetes, Kubeflow, and Kueue investments you've already made, and adds the team context, observability, and governance layer that's missing today.

Why we want design partners, not beta testers

We're at the prototype stage. The architecture is taking shape, and the questions we're still validating are exactly the ones you live with every day:

How are teams actually developing on Kubernetes, notebooks first, or direct job submission?
What does the handoff from training to inference look like in your org?
How do you define and enforce GPU quotas across teams today — manually, or with tooling?
What breaks first when a new AI team lands on your platform?

We're onboarding a small, deliberate group of platform and DevOps teams who are dealing with GPU orchestration on Kubernetes today. Not a waitlist. Not a feature release announcement.

Design partners get:

Early access to the prototype before public release
Direct input on what gets built, your workflows shape the roadmap
A dedicated engineering contact for feedback and iteration

In return, we ask for honest feedback from your actual workflows. Not hypotheticals. Not a survey.

GPU infrastructure is becoming a core platform problem

Kubernetes was built for stateless microservices. AI workloads break almost every assumption it was designed around — long job durations, heterogeneous hardware, shared scarce resources, competing SLAs between training and inference.

Platform teams who figure out GPU orchestration now will have a meaningful advantage as AI workloads scale inside their orgs. Those who patch it together with scripts and spreadsheets will spend the next two years firefighting.

The teams who get ahead of this problem won't be the ones with the most GPUs. They'll be the ones who built the right platform around them.

We're onboarding a small group of platform teams managing GPU workloads on Kubernetes today.

Early access. Direct line to the engineering team. Real influence on the roadmap.

Apply for early access → https://devtron.ai/platform/gpu-orchestration

Or reach out directly, we want to talk before we build too much in the wrong direction.

Deepak Panwar

Results-driven Lead Quality Engineering professional with deep end-to-end ownership of the testing lifecycle, spanning integration, UI, API, performance, and security testing.

Tags:
GPU
Kubernetes

Documentation

Devtron Plugins

Devtron OSS

Release Notes

Join Developer Discord

See the Platform Overview

Watch 3-Minute Demo

Agentic SRE

Join Early Access Waitlist

100+ Integrations

Application Management

Infrastructure Management

Security & Governance

Observability

FinOps & Cost Management

Storage & Backup

Book Enterprise Demo

Install Open Source

VMware Tanzu Migration

Commercial Software Distribution

Kubernetes for Telcos

Telecommunications

Financial Services

Retail & E-commerce

Book Enterprise Demo

Install Open Source

Blog

Case Studies

Videos

Events & Webinars

eBooks

Reviews