Feature Flags & A/B Testing

Medium PriorityAsked in ~50% of senior interviews

4 min read

Operations

TypePurposeExamples
Release flagHide work-in-progress until ready"new checkout flow"
Kill switchDisable a feature in production if broken"disable image picker on Android 14"
A/B variantCompare two implementations"control vs new onboarding"
Config flagTweak parameter values remotelymax_upload_size_mb: 10

A good flag system supports all four with the same API.


Code in action — Firebase Remote Config + Riverpod

class FeatureFlagService {
  FeatureFlagService(this._rc);
  final FirebaseRemoteConfig _rc;

  Future<void> init() async {
    await _rc.setDefaults(const {
      'new_checkout_flow': false,
      'max_upload_size_mb': 10,
      'onboarding_variant': 'control',
    });
    await _rc.setConfigSettings(RemoteConfigSettings(
      fetchTimeout: const Duration(seconds: 10),
      minimumFetchInterval: const Duration(hours: 1),
    ));
    await _rc.fetchAndActivate();
  }

  bool   isEnabled(String key) => _rc.getBool(key);
  int    getInt(String key)    => _rc.getInt(key);
  String getString(String key) => _rc.getString(key);
}

// DI
final featureFlagsProvider = Provider<FeatureFlagService>((ref) =>
    FeatureFlagService(FirebaseRemoteConfig.instance));
// Release flag — branch on it
class CheckoutScreen extends ConsumerWidget {
  @override
  Widget build(BuildContext context, WidgetRef ref) {
    final flags = ref.watch(featureFlagsProvider);
    return flags.isEnabled('new_checkout_flow')
        ? const NewCheckoutFlow()
        : const LegacyCheckoutFlow();
  }
}

// A/B variant — assign + track
class OnboardingScreen extends ConsumerWidget {
  @override
  Widget build(BuildContext context, WidgetRef ref) {
    final variant = ref.watch(featureFlagsProvider).getString('onboarding_variant');

    // Track exposure ONCE — not every rebuild
    useEffect(() {
      Observability.track('experiment_viewed', props: {
        'experiment': 'onboarding',
        'variant': variant,
      });
      return null;
    }, [variant]);

    return switch (variant) {
      'variant_a' => const OnboardingVariantA(),
      'variant_b' => const OnboardingVariantB(),
      _           => const OnboardingControl(),
    };
  }
}

Best practices

PracticeWhy
Set hard-coded defaults BEFORE fetchFirst launch / offline still works
Cache last known values on diskSurvives app restarts before remote fetch
Track variant exposure in analyticsNeeded for valid A/B analysis
Fetch on app start + periodicallyNew users get fresh; existing get updates
Kill switches should fail closedIf config can't load, default to "feature off"
Provider abstraction (FeatureFlagService)Swap Firebase ↔ LaunchDarkly ↔ in-memory for tests
Document each flag's purpose, owner, removal dateOtherwise you accrue zombie flags
Clean up flags after rolloutsDead flag code = bug surface

A/B testing pitfalls

MistakeEffect
Sampling assignment on every screen viewVariant flips per session → results meaningless
Tracking variant by inferring from UIDrift between assignment and tracking
Comparing groups by primary metric without statistical testFalse-positive winners
Running experiments without a stop dateDecisions drift; resources wasted
Not pre-registering the hypothesisPost-hoc "p-hacking"
Testing too many things at onceHard to attribute lift

The platform (Firebase A/B Testing, GrowthBook, Statsig) handles assignment + statistics if you let it.


Common mistakes to avoid

❌ Reading flags from network on every screen build
   Slows UI; runs the risk of network failures.
   ✅ Read from a local FlagService that's already hydrated.

❌ No defaults
   First launch / no network → flags return zero values → app broken.

❌ Treating flags as permanent
   Six months later you have 50 flags, half are zombies.
   ✅ Assign an owner + removal date to every flag.

❌ Branching deeply on flags throughout the codebase
   Two flags × two flags × two flags = 8 paths to test.
   ✅ Branch at one well-defined seam (the screen, the service).

❌ Forgetting to track variant in analytics
   You can't measure the A/B test without recording who saw what.

❌ A/B testing without sufficient sample size
   N=100 doesn't tell you anything statistically. Estimate sample size up-front.

❌ Mixing release flags with kill switches
   A kill switch should be obvious, not "well it's also a release flag."
   ✅ Distinguish in naming and tooling.

Interview follow-ups

  1. What's the difference between a feature flag, an A/B test, and a kill switch? They're all variants of remote config. A feature flag hides incomplete work until ready (boolean). An A/B test randomly assigns variants and measures impact (group label). A kill switch disables a feature remotely without a release (boolean, but with urgency). The implementation is similar; the lifecycle and ownership differ.

  2. How do you ensure A/B test results are attributable? Fire an experiment_viewed event with experiment_name + variant exactly once per user when they first see the variant. Tools like Firebase A/B Testing handle this automatically when you read flags via their SDK. Without exposure tracking, you can't analyse outcomes — you only know "this flag exists."

  3. What happens if Remote Config fetch fails? Firebase falls back to the last cached values; if there are none, to your setDefaults. Your code should never throw on missing flags — isEnabled('unknown_key') returns false. That's why hard-coded defaults are non-negotiable.

  4. How do you clean up dead flags? Track ownership and intended removal date in code or a flag registry. Set CI rules that warn on flags older than N days. Periodically audit: "is anyone reading this flag?" → delete the flag + the dead branch. The cost of not cleaning up is more than the engineering time spent flagging things.


How helpful was this content?

Please sign in to rate this article.