Retry Logic with Exponential Backoff
3 min read
Networking
| Decision | Rule |
|---|---|
| Should you retry? | Only for transient failures: timeout, network drop, 5xx, 429 (with Retry-After) |
| Don't retry | 4xx other than 408/429 — these are deterministic, retrying won't help |
| Idempotency | Safe: GET/HEAD/PUT/DELETE. Unsafe: POST without an idempotency key |
| Delay shape | Exponential: base × 2^attempt |
| Jitter | Add random() so 1000 clients don't all retry at second-2 simultaneously |
| Stop condition | Max attempts OR max elapsed time, whichever comes first |
| Respect server signals | If response has Retry-After, honour it |
Code in action — Dio interceptor
class RetryInterceptor extends Interceptor {
RetryInterceptor({this.maxRetries = 3, this.base = const Duration(seconds: 1)});
final int maxRetries;
final Duration base;
final _rng = Random();
@override
Future<void> onError(DioException err, ErrorInterceptorHandler h) async {
if (!_shouldRetry(err)) return h.next(err);
final attempt = (err.requestOptions.extra['retry'] ?? 0) as int;
if (attempt >= maxRetries) return h.next(err);
// Exponential + ±20% jitter
final factor = pow(2, attempt).toInt();
final jitterMs = (base.inMilliseconds * factor * 0.2 * _rng.nextDouble()).toInt();
final delay = base * factor + Duration(milliseconds: jitterMs);
await Future.delayed(delay);
err.requestOptions.extra['retry'] = attempt + 1;
try {
final res = await Dio().fetch(err.requestOptions);
h.resolve(res);
} catch (_) {
h.next(err);
}
}
bool _shouldRetry(DioException e) =>
e.type == DioExceptionType.connectionTimeout ||
e.type == DioExceptionType.receiveTimeout ||
(e.response?.statusCode ?? 0) >= 500 ||
e.response?.statusCode == 429;
}
Plain-Dart retry helper (no Dio)
Future<T> retry<T>(
Future<T> Function() task, {
int max = 3,
Duration base = const Duration(seconds: 1),
}) async {
final rng = Random();
for (var i = 0; ; i++) {
try {
return await task();
} catch (e) {
if (i >= max) rethrow;
final factor = pow(2, i).toInt();
final jitterMs = (base.inMilliseconds * factor * 0.2 * rng.nextDouble()).toInt();
await Future.delayed(base * factor + Duration(milliseconds: jitterMs));
}
}
}
// Usage
final user = await retry(() => api.fetchUser('123'));
When to retry vs surface the error
| Outcome | Action |
|---|---|
| 5xx, timeout, network drop | Retry (capped) |
429 with Retry-After: N | Wait N seconds and retry |
| 401 unauthorized | Refresh token, retry once; don't retry blindly |
| 4xx (400, 404, 422) | Surface to user — retrying won't help |
| Specific error you know is transient (e.g., DB lock) | Retry, but log to spot trends |
Common mistakes to avoid
// ❌ Retrying everything blindly
for (var i = 0; i < 5; i++) { try { ... } catch (_) {} } // retries 404, etc.
// ✅ Whitelist transient errors
// ❌ No jitter
// 10k devices all retry at second 2 → server gets a 10k-request spike
// ✅ Add randomness to the delay
// ❌ Retrying non-idempotent POSTs (charges, sends) without an idempotency key
// → duplicate orders, duplicate notifications
// ✅ Require an `Idempotency-Key` header; the server dedupes
// ❌ Unbounded retry loop
// Backoff exits the function but you still loop forever
// ✅ Hard cap on attempts AND wall-clock time
// ❌ Retrying inside business logic AND in an interceptor
// Multiplicative retries — 3 × 3 = 9 attempts when you intended 3
// ✅ Pick one layer
Interview follow-ups
-
Why is jitter important in exponential backoff? Without jitter, every client backs off the same amount and retries at the same instant — a "thundering herd" that hammers the server you're trying to give breathing room. Adding randomness spreads retries out over time.
-
When is it dangerous to retry a request? When the request isn't idempotent — POSTs that create resources, charge cards, send messages. If you must retry, require an idempotency key the server uses to dedupe (most modern APIs support this; Stripe popularised it).
-
How do you handle the
Retry-Afterheader? When a 429 (or 503) includesRetry-After, honour it instead of your own delay. It's either seconds or an HTTP-date. Treat it as the minimum wait — pile your jitter on top if you want. -
Where does retry logic belong — call site, repository, or interceptor? In one place. Interceptor is cleanest for cross-cutting transient retries (timeouts, 5xx). Application logic (repository) is the right place for retries that need business context (refresh-token-then-retry, "if cart is locked, wait and re-try"). Avoid stacking retries at multiple layers.
How helpful was this content?
Please sign in to rate this article.