Zero-Downtime Kubernetes Deployments: A Practical Guide
How to configure rolling updates, PodDisruptionBudgets, and readiness probes to achieve true zero-downtime deploys in Kubernetes.
The problem with "zero-downtime" claims
Almost every Kubernetes tutorial claims zero-downtime deployments are easy — just use a RollingUpdate strategy and you're done. In practice, there are at least five ways you can still drop traffic even with rolling updates configured correctly.
This post covers the full picture: rolling update configuration, pod disruption budgets, readiness probes, preStop hooks, and graceful shutdown handling in your application.
Rolling update configuration
The starting point is your Deployment's update strategy:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # never take pods offline before new ones are ready
maxSurge: 1 # allow one extra pod during rollout
Setting maxUnavailable: 0 is the key — Kubernetes won't terminate old pods until new ones pass their readiness check.
Readiness probes that actually work
A readiness probe that returns 200 too early is worse than no probe at all. Your app needs to be genuinely ready to handle traffic before the probe passes:
readinessProbe:
httpGet:
path: /healthz/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
successThreshold: 1
The /healthz/ready endpoint should check that your app's dependencies (database connections, caches, config) are warm, not just that the HTTP server started.
PodDisruptionBudgets
Rolling updates only protect you from voluntary disruptions you cause. Node drains during maintenance or cluster upgrades need a PodDisruptionBudget:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 2 # or use maxUnavailable: 1
selector:
matchLabels:
app: my-app
Without a PDB, kubectl drain will evict all pods from a node simultaneously regardless of your rolling update settings.
Graceful shutdown with preStop hooks
Kubernetes sends SIGTERM to your pod and immediately removes it from Service endpoints — but there's a race condition. In-flight requests routed to your pod just before the endpoint update propagates will fail.
The fix is a preStop hook that adds a short sleep:
lifecycle:
preStop:
exec:
command: ["sleep", "5"]
This keeps the pod alive for 5 seconds after Kubernetes starts removing it from the load balancer, giving existing connections time to drain.
Putting it all together
With all four pieces in place — maxUnavailable: 0, working readiness probes, a PDB, and preStop hooks — you have a deployment process that genuinely drops zero connections during updates.
The next step is validating this with chaos engineering: run a rolling deploy while continuously sending traffic, and measure error rates. Tools like k6 or hey make this straightforward.