Skip to main content

Kubernetes Autoscaling Tuning

Tune HPA, VPA, and KEDA for an LLM workload. Practice the cost-vs-latency trade-offs of 2026 autoscaling.

KubernetesKEDAPrometheusk6 or Locust

About this project

LLM workloads have unique scaling characteristics — bursty, expensive, latency-sensitive. This project teaches the modern autoscaling stack: HPA + VPA + KEDA, custom metrics, predictive scaling, and the cost optimization angle. Set up a sample LLM API workload, model real traffic, and tune autoscaling to hit target latency at minimum cost.

Why build this in 2026?

AI workloads dominate cloud spend; SREs who tune autoscaling for them have outsized leverage.

What you'll ship

  • GitHub repo
Before/after cost comparison
Latency-vs-cost graph

Sign up to see the full project brief

Full deliverables, success criteria, and AI Career Tutor support — free.

You'll unlock:Complete project brief, AI tutor that knows this project, and progress tracking when you start.

Skills you'll practice

kubernetesmonitoringdistributed systems