Resilience
Distributed systems fail in partial and unpredictable ways. Istio provides resilience features at the network layer so that every service benefits from consistent failure-handling behaviour without each team re-implementing it.
What it Gives You
- Retries - automatically retry failed requests, with control over the number of attempts and per-try timeout.
- Timeouts - bound how long a caller waits for a response so a slow dependency does not cascade into a wider outage.
- Circuit breaking - limit concurrent connections and requests to a service to protect it from overload.
- Outlier detection - automatically eject unhealthy endpoints from the load balancing pool until they recover.
How it Works
Resilience policies are enforced by the data plane proxies, so they apply uniformly to all traffic regardless of the language or framework a service is written in.
| Resource | Purpose |
|---|---|
VirtualService | Configures retries and timeouts for routes |
DestinationRule | Configures connection pool limits, circuit breaking, and outlier detection |
When to Use It
- Protecting a service from a slow or failing downstream dependency
- Smoothing over transient network errors with bounded retries
- Preventing a single overloaded instance from degrading the whole service
- Establishing consistent timeout behaviour across many teams
Warning
Retries and timeouts interact. A retry policy with a long per-try timeout can multiply total latency. Set an overall route timeout and keep per-try timeouts short.