Retry Logic with Exponential Backoff

Medium Priority

3 min read

Networking

Decision	Rule
Should you retry?	Only for transient failures: timeout, network drop, 5xx, 429 (with Retry-After)
Don't retry	4xx other than 408/429 — these are deterministic, retrying won't help
Idempotency	Safe: GET/HEAD/PUT/DELETE. Unsafe: POST without an idempotency key
Delay shape	Exponential: `base × 2^attempt`
Jitter	Add `random()` so 1000 clients don't all retry at second-2 simultaneously
Stop condition	Max attempts OR max elapsed time, whichever comes first
Respect server signals	If response has `Retry-After`, honour it

Code in action — Dio interceptor

class RetryInterceptor extends Interceptor {
  RetryInterceptor({this.maxRetries = 3, this.base = const Duration(seconds: 1)});
 
  final int maxRetries;
  final Duration base;
  final _rng = Random();
 
  @override
  Future<void> onError(DioException err, ErrorInterceptorHandler h) async {
    if (!_shouldRetry(err)) return h.next(err);
 
    final attempt = (err.requestOptions.extra['retry'] ?? 0) as int;
    if (attempt >= maxRetries) return h.next(err);
 
    // Exponential + ±20% jitter
    final factor   = pow(2, attempt).toInt();
    final jitterMs = (base.inMilliseconds * factor * 0.2 * _rng.nextDouble()).toInt();
    final delay    = base * factor + Duration(milliseconds: jitterMs);
 
    await Future.delayed(delay);
    err.requestOptions.extra['retry'] = attempt + 1;
 
    try {
      final res = await Dio().fetch(err.requestOptions);
      h.resolve(res);
    } catch (_) {
      h.next(err);
    }
  }
 
  bool _shouldRetry(DioException e) =>
      e.type == DioExceptionType.connectionTimeout ||
      e.type == DioExceptionType.receiveTimeout    ||
      (e.response?.statusCode ?? 0) >= 500          ||
      e.response?.statusCode == 429;
}

Plain-Dart retry helper (no Dio)

Future<T> retry<T>(
  Future<T> Function() task, {
  int max = 3,
  Duration base = const Duration(seconds: 1),
}) async {
  final rng = Random();
  for (var i = 0; ; i++) {
    try {
      return await task();
    } catch (e) {
      if (i >= max) rethrow;
      final factor   = pow(2, i).toInt();
      final jitterMs = (base.inMilliseconds * factor * 0.2 * rng.nextDouble()).toInt();
      await Future.delayed(base * factor + Duration(milliseconds: jitterMs));
    }
  }
}
 
// Usage
final user = await retry(() => api.fetchUser('123'));

When to retry vs surface the error

Outcome	Action
5xx, timeout, network drop	Retry (capped)
429 with `Retry-After: N`	Wait N seconds and retry
401 unauthorized	Refresh token, retry once; don't retry blindly
4xx (400, 404, 422)	Surface to user — retrying won't help
Specific error you know is transient (e.g., DB lock)	Retry, but log to spot trends

Common mistakes to avoid

// ❌ Retrying everything blindly
for (var i = 0; i < 5; i++) { try { ... } catch (_) {} }    // retries 404, etc.
// ✅ Whitelist transient errors
 
// ❌ No jitter
// 10k devices all retry at second 2 → server gets a 10k-request spike
// ✅ Add randomness to the delay
 
// ❌ Retrying non-idempotent POSTs (charges, sends) without an idempotency key
// → duplicate orders, duplicate notifications
// ✅ Require an `Idempotency-Key` header; the server dedupes
 
// ❌ Unbounded retry loop
// Backoff exits the function but you still loop forever
// ✅ Hard cap on attempts AND wall-clock time
 
// ❌ Retrying inside business logic AND in an interceptor
// Multiplicative retries — 3 × 3 = 9 attempts when you intended 3
// ✅ Pick one layer

Interview follow-ups

Why is jitter important in exponential backoff? Without jitter, every client backs off the same amount and retries at the same instant — a "thundering herd" that hammers the server you're trying to give breathing room. Adding randomness spreads retries out over time.
When is it dangerous to retry a request? When the request isn't idempotent — POSTs that create resources, charge cards, send messages. If you must retry, require an idempotency key the server uses to dedupe (most modern APIs support this; Stripe popularised it).
Where does retry logic belong — call site, repository, or interceptor? In one place. Interceptor is cleanest for cross-cutting transient retries (timeouts, 5xx). Application logic (repository) is the right place for retries that need business context (refresh-token-then-retry, "if cart is locked, wait and re-try"). Avoid stacking retries at multiple layers.

How helpful was this content?

Please sign in to rate this article.