Kubernetes Autoscaling Tuning
Tune HPA, VPA, and KEDA for an LLM workload. Practice the cost-vs-latency trade-offs of 2026 autoscaling.
KubernetesKEDAPrometheusk6 or Locust
About this project
LLM workloads have unique scaling characteristics — bursty, expensive, latency-sensitive. This project teaches the modern autoscaling stack: HPA + VPA + KEDA, custom metrics, predictive scaling, and the cost optimization angle. Set up a sample LLM API workload, model real traffic, and tune autoscaling to hit target latency at minimum cost.
Why build this in 2026?
AI workloads dominate cloud spend; SREs who tune autoscaling for them have outsized leverage.
What you'll ship
- GitHub repo
Before/after cost comparison
Latency-vs-cost graph
Sign up to see the full project brief
Full deliverables, success criteria, and AI Career Tutor support — free.
You'll unlock:Complete project brief, AI tutor that knows this project, and progress tracking when you start.
Skills you'll practice
kubernetesmonitoringdistributed systems