Cheat Sheet — A/B Testing & Feature Flags

Flags vs A/B experiments

	Feature flag	A/B experiment
Goal	Safe deploy / kill switch	Measure metric impact
Assignment	Targeted (% rollout)	Random + sticky (user ID hash)
Needs stats?	No	Yes — pre-commit metric + sample size
Holdback?	No	Yes — 5–10% for long-term signal

Variant assignment options

Where	Pro	Con
Client-side	Fast to ship	Flicker (CLS), no SEO
Server-side (SSR)	No flicker, SEO-safe	Needs cookie context, CDN harder
Edge	No flicker, cacheable, fast	Limited runtime env

Default: server-side or edge. Client-side only with SSR-embedded flag payload (window.__FLAGS__) to avoid SDK round-trip flicker.

Caching variants

Bad Vary: Cookie — CDNs ignore or bypass cache entirely

Good URL-based routing: /search/control vs /search/variant — each gets its own cache entry

Best Edge middleware rewrites URL per assignment → variant HTML independently cached → high hit rate

Safe rollout + holdback

1% → 10% → 50% → 90% → [5% holdback]

Check errors + p75 CWV at each step. Holdback = permanent 5–10% on control to detect novelty effects and long-term degradation.

Holdback cost: deliberately withholding a good experience from 5–10% of users. Worth it for large uncertain changes, not small clear wins.

Statistical traps (know these)

Trap	Fix
Peeking — stop early when p<0.05	Pre-commit run duration + sample size. Use sequential testing methods for continuous monitoring.
SRM — actual split ≠ configured (e.g. 54/46 vs 50/50)	Experiment is invalid. Debug assignment logic. Never use SRM results for decisions.
Novelty effect — lift decays after week 1	Run ≥ 2 weeks. Holdback reveals long-term.
Multiple metrics — conversion up, revenue down	Pre-commit one primary metric. Rest are informational.

Flag lifecycle governance

Dead flags = tech debt + rollback risk
90-day TTL policy — auto-alert for expired flags
Instrument RUM segmented by experiment group — so a bad test doesn't corrupt global CWV baseline
Sticky assignment via hashed user ID — same variant across sessions/devices

Don't fail the interview

Hide-then-show anti-flicker: setting visibility:hidden on body until SDK loads delays FCP for ALL users, not just experiment participants. Degrades Core Web Vitals fleet-wide. Fix: SSR-embed the flag assignment so SDK reads synchronously from window.__FLAGS__.

Peeking trap: p=0.03 after 2 days does NOT mean the variant won. Pre-commit sample size and run duration before launching. Checking every day and stopping early inflates false positives — you'll declare a winner on noise.