Retry Logic with Exponential Backoff

Medium PriorityAsked in ~45% of mid-level interviews

3 min read

Networking

DecisionRule
Should you retry?Only for transient failures: timeout, network drop, 5xx, 429 (with Retry-After)
Don't retry4xx other than 408/429 — these are deterministic, retrying won't help
IdempotencySafe: GET/HEAD/PUT/DELETE. Unsafe: POST without an idempotency key
Delay shapeExponential: base × 2^attempt
JitterAdd random() so 1000 clients don't all retry at second-2 simultaneously
Stop conditionMax attempts OR max elapsed time, whichever comes first
Respect server signalsIf response has Retry-After, honour it

Code in action — Dio interceptor

class RetryInterceptor extends Interceptor {
  RetryInterceptor({this.maxRetries = 3, this.base = const Duration(seconds: 1)});

  final int maxRetries;
  final Duration base;
  final _rng = Random();

  @override
  Future<void> onError(DioException err, ErrorInterceptorHandler h) async {
    if (!_shouldRetry(err)) return h.next(err);

    final attempt = (err.requestOptions.extra['retry'] ?? 0) as int;
    if (attempt >= maxRetries) return h.next(err);

    // Exponential + ±20% jitter
    final factor   = pow(2, attempt).toInt();
    final jitterMs = (base.inMilliseconds * factor * 0.2 * _rng.nextDouble()).toInt();
    final delay    = base * factor + Duration(milliseconds: jitterMs);

    await Future.delayed(delay);
    err.requestOptions.extra['retry'] = attempt + 1;

    try {
      final res = await Dio().fetch(err.requestOptions);
      h.resolve(res);
    } catch (_) {
      h.next(err);
    }
  }

  bool _shouldRetry(DioException e) =>
      e.type == DioExceptionType.connectionTimeout ||
      e.type == DioExceptionType.receiveTimeout    ||
      (e.response?.statusCode ?? 0) >= 500          ||
      e.response?.statusCode == 429;
}

Plain-Dart retry helper (no Dio)

Future<T> retry<T>(
  Future<T> Function() task, {
  int max = 3,
  Duration base = const Duration(seconds: 1),
}) async {
  final rng = Random();
  for (var i = 0; ; i++) {
    try {
      return await task();
    } catch (e) {
      if (i >= max) rethrow;
      final factor   = pow(2, i).toInt();
      final jitterMs = (base.inMilliseconds * factor * 0.2 * rng.nextDouble()).toInt();
      await Future.delayed(base * factor + Duration(milliseconds: jitterMs));
    }
  }
}

// Usage
final user = await retry(() => api.fetchUser('123'));

When to retry vs surface the error

OutcomeAction
5xx, timeout, network dropRetry (capped)
429 with Retry-After: NWait N seconds and retry
401 unauthorizedRefresh token, retry once; don't retry blindly
4xx (400, 404, 422)Surface to user — retrying won't help
Specific error you know is transient (e.g., DB lock)Retry, but log to spot trends

Common mistakes to avoid

// ❌ Retrying everything blindly
for (var i = 0; i < 5; i++) { try { ... } catch (_) {} }    // retries 404, etc.
// ✅ Whitelist transient errors

// ❌ No jitter
// 10k devices all retry at second 2 → server gets a 10k-request spike
// ✅ Add randomness to the delay

// ❌ Retrying non-idempotent POSTs (charges, sends) without an idempotency key
// → duplicate orders, duplicate notifications
// ✅ Require an `Idempotency-Key` header; the server dedupes

// ❌ Unbounded retry loop
// Backoff exits the function but you still loop forever
// ✅ Hard cap on attempts AND wall-clock time

// ❌ Retrying inside business logic AND in an interceptor
// Multiplicative retries — 3 × 3 = 9 attempts when you intended 3
// ✅ Pick one layer

Interview follow-ups

  1. Why is jitter important in exponential backoff? Without jitter, every client backs off the same amount and retries at the same instant — a "thundering herd" that hammers the server you're trying to give breathing room. Adding randomness spreads retries out over time.

  2. When is it dangerous to retry a request? When the request isn't idempotent — POSTs that create resources, charge cards, send messages. If you must retry, require an idempotency key the server uses to dedupe (most modern APIs support this; Stripe popularised it).

  3. How do you handle the Retry-After header? When a 429 (or 503) includes Retry-After, honour it instead of your own delay. It's either seconds or an HTTP-date. Treat it as the minimum wait — pile your jitter on top if you want.

  4. Where does retry logic belong — call site, repository, or interceptor? In one place. Interceptor is cleanest for cross-cutting transient retries (timeouts, 5xx). Application logic (repository) is the right place for retries that need business context (refresh-token-then-retry, "if cart is locked, wait and re-try"). Avoid stacking retries at multiple layers.


How helpful was this content?

Please sign in to rate this article.