Load balancing, caching layers, scaling strategies, microservices, and containers — the system design vocabulary a FE Lead needs to own the technical interview.
Every user action is a request that travels through layers — and each layer is a scaling knob. Step through the animation to see a hotel search request travel from browser to database and back.
Important: not every request travels all the way down. Static assets (JS/CSS/images) are served by the CDN and never reach your servers. Write operations (POST/PUT/DELETE) skip Redis entirely. Redis sits only on the cacheable read hot path. The WAF screens all inbound traffic before it touches your infrastructure.
Scalability isn't one thing — it's a series of layers. I start at the edge (CDN) and work inward, adding capacity at whichever layer the bottleneck lives.
A load balancer distributes incoming requests across a pool of servers. Layer 4 (TCP/IP) LBs route by IP/port; Layer 7 LBs (the type FE leads care about) understand HTTP — they can route by URL path, header, cookie, or body content.
hash(client_ip) % server_count — the same client always hits the same server. Needed for stateful sessions stored in memory (e.g. WebSocket rooms). Kills even distribution if one subnet floods the pool; also breaks if a server goes down and the hash shifts.The LB periodically sends a probe (HTTP GET /health or TCP ping) to each server. If a server fails N consecutive checks, it is removed from rotation. Key config knobs: Interval (how often to probe), Threshold (consecutive failures before marking unhealthy), Timeout (how long to wait). A well-designed /health endpoint checks DB connectivity and cache reachability — not just HTTP 200. A server that can't reach its DB should return 503 so it is pulled immediately.
Horizontal auto-scaling (AWS Auto Scaling Groups, GCP MIGs, Kubernetes HPA) adds or removes instances based on a signal — typically CPU %, request queue depth, or custom metrics. The LB's server pool updates automatically. For platform traffic spikes (flash sales, promotions), set a minimum fleet for baseline, a maximum ceiling for cost control, and scale-out triggers so the fleet grows before the spike hits, not after.
I default to round robin + a Redis session store. That keeps every server stateless and lets the LB freely balance without sticky sessions.
Caching is the highest-leverage performance and scale tool available. Each layer has a different hit rate, latency, and invalidation mechanism.
Cache-Control, ETag, Last-Modified. Hashed assets → max-age=31536000, immutable. HTML → no-cache so the browser always revalidates.| Provider | Best for | Key differentiator |
|---|---|---|
| Cloudflare | Global sites, DDoS protection, edge compute | Largest PoP network (~300+), free tier, Workers for edge logic |
| AWS CloudFront | AWS-native apps | Deep AWS integration (S3, ALB, Lambda@Edge) |
| Akamai | Enterprise, media streaming, large e-commerce | Oldest, widest enterprise features, strong SLA guarantees |
The CDN evaluates three signals in order to decide whether to cache a response:
| # | Signal | Example | Notes |
|---|---|---|---|
| 1 | Cache-Control header from origin | max-age=31536000, immutable → cache 1 yrno-cache / no-store → bypass | Most authoritative. Your S3 bucket or app server sets this on every response. |
| 2 | CDN rules / Cache Behaviors | Cloudflare Page Rule: "Cache everything at /assets/*"CloudFront Cache Behavior: bypass cache for /api/* | Override or supplement headers. Lets you cache responses even if the origin doesn't send Cache-Control. |
| 3 | HTTP method | GET / HEAD → cacheable POST / PUT / DELETE → always bypassed | CDNs never cache write operations — they are inherently non-idempotent. |
Static asset on CDN miss → origin object store, not the DB. When a CDN misses on app.3f7a1c.js, it fetches from its configured origin — an S3 bucket (or GCS, R2, nginx /public) where your CI/CD pipeline uploaded the build output. The Load Balancer, App Servers, Redis, and database are never involved. The DB doesn't even know static assets exist; they are pre-compiled files that live in object storage.
| Redis | Memcached | |
|---|---|---|
| Data types | Strings, lists, sets, sorted sets, hashes, streams | Strings only |
| Persistence | RDB snapshots + AOF log | None (volatile) |
| Pub/Sub | Yes | No |
| Clustering | Redis Cluster (native sharding) | Client-side sharding |
| Use Redis when | You need rich data types, persistence, or pub/sub. Default choice for new projects. | |
| Use Memcached when | Pure key-value cache, very simple needs, maximum simplicity. | |
Caching is a contract: you trade consistency for speed. I always define the staleness budget before choosing a cache strategy — "5 seconds stale is fine" changes the design completely.
Make one machine bigger — more CPU, RAM, faster disk. Simple: no code changes, no coordination. Good for monoliths and databases that are hard to shard.
Ceiling: Physical/cost limit. Single point of failure. Hard to scale down.
Add more machines behind a load balancer. Each node is smaller and cheap. Fault-tolerant by default — one node fails, others absorb traffic.
Requirement: App must be stateless. Shared state must live in an external store (Redis).
Modern cloud architecture defaults to horizontal scaling for app servers and uses vertical scaling selectively for databases (until the DB itself needs to be sharded or replicated).
Horizontal scaling is the default for app servers. The constraint is always "make the app stateless first" — once state is in Redis, you can add servers freely.
A microservice architecture splits a monolith into small, independently deployable services — each owning its data, its deploy cycle, and its scaling policy.
Docker packages an app and all its dependencies into a container image — a portable, isolated unit that runs identically on any machine with a Docker runtime.
Dockerfile).# Minimal Node.js Dockerfile FROM node:20-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --omit=dev COPY . . EXPOSE 3000 CMD ["node", "server.js"]
Kubernetes is a container orchestrator. Given a fleet of machines (nodes), it decides which containers run where, restarts crashed containers, scales replicas up/down, rolls out new versions, and routes traffic. You describe desired state in YAML; Kubernetes makes it real.
| Concept | What it is | Why it matters for FE Lead |
|---|---|---|
| Pod | Smallest deployable unit — one or more containers sharing a network namespace | Your Next.js SSR server runs as a pod |
| Deployment | Declares desired replica count and image version; manages rolling updates | Zero-downtime deploys: new pods start before old ones stop |
| Service | Stable virtual IP that routes to healthy pods | Decouples pod IPs (ephemeral) from the address other services call |
| Ingress | HTTP router at the cluster edge — routes by hostname/path to Services | Routes /api/* to backend service, / to SSR service |
| HPA | Horizontal Pod Autoscaler — scales replica count based on CPU or custom metrics | Handles flash-sale spikes automatically |
| ConfigMap / Secret | Inject env vars and secrets without baking them into the image | Keeps images portable; rotate secrets without rebuilding |
When you push a new Docker image, Kubernetes performs a rolling update by default: it gradually replaces old pods with new ones. Combined with readiness probes, this gives you zero-downtime deploys.
Interviewers love this one because juniors treat the three as interchangeable buzzwords. They aren't competitors — they sit at three different layers and answer three different questions. Get the layering right and you sound like someone who has actually shipped.
"Build me the infrastructure." Declarative Infrastructure-as-Code: VPC, subnets, the EKS/GKE cluster itself, node pools, RDS, load balancer, DNS, IAM. terraform apply reconciles real cloud state to your .tf files.
Changes rarely — weeks/months.
"Package my app so it runs identically anywhere." An immutable image: app + runtime + deps. CI builds it, tags it by commit SHA, pushes to a registry (ECR / Artifact Registry).
Changes every code change.
"Run, scale, heal, and route these packages across machines." Takes the Docker image and the nodes Terraform created, schedules pods, restarts crashed ones, autoscales (HPA), and does rolling deploys. Changes every deploy.
Terraform, Docker, and Kubernetes are the what. CI/CD pipelines and GitOps tools are the how — the automation that moves a code change through each layer without a human running commands.
Runs on the code side: lint → test → docker build → tag image with commit SHA → push to registry (ECR / GCR / Artifact Registry). The pipeline's job ends when a verified, immutable image exists in the registry.
GitLab CI example: .gitlab-ci.yml defines stages; each stage runs in its own Docker container on a GitLab Runner.
Runs on the cluster side: watches a Git repo of K8s manifests (Helm charts or raw YAML). When the manifest is updated (e.g. image tag bumped), ArgoCD syncs the cluster to match — no human runs kubectl apply. Git becomes the single source of truth for cluster state.
ArgoCD: cluster pulls from Git (pull model) — more secure than CI pushing into the cluster.
docker build -t registry/ssr:$SHA . and pushes the image. Pipeline then bumps the image tag in the K8s manifest repo (a PR or a direct commit to a deploy branch).# 1. Terraform — provision the cluster (runs rarely) resource "aws_eks_cluster" "main" { name = "prod-cluster" } # → terraform apply VPC + EKS + RDS + LB # 2. GitLab CI — build + push on every commit # docker build -t ecr.../ssr:$CI_COMMIT_SHA . # docker push ecr.../ssr:$CI_COMMIT_SHA # sed -i "s|image:.*|image: ecr.../ssr:$CI_COMMIT_SHA|" k8s/deployment.yaml # git commit -m "deploy: bump ssr to $CI_COMMIT_SHA" # 3. ArgoCD — detects manifest change, syncs cluster # (no human runs kubectl — ArgoCD pulls from Git and applies) # 4. Kubernetes — rolls out new pods, heals, autoscales # kubectl rollout status deployment/ssr ← CI can poll this for status
| Tool | Question it answers | Layer | Cadence |
|---|---|---|---|
| Terraform | "Build me the infra to run on" | Provisioning (cloud) | Rarely |
| GitLab CI / GitHub Actions | "Test, build, and package my code" | CI — build pipeline | Every commit |
| Docker | "Package my app to run anywhere" | Packaging (artifact) | Every commit |
| ArgoCD / Flux | "Keep the cluster in sync with Git" | CD — GitOps sync | Every manifest change |
| Kubernetes | "Run/scale/heal these packages" | Orchestration (runtime) | Every deploy |
kubectl apply from CI). GitOps flips it: ArgoCD/Flux pulls from Git and applies. Pull is more secure (CI never holds cluster credentials) and gives you drift detection — ArgoCD alerts if someone manually edits the cluster without updating Git. The tradeoff: GitOps adds a second repo and sync loop to maintain.
Terraform builds the world. CI packages the code. GitOps syncs the cluster. Kubernetes runs it. Four tools, four questions, one pipeline — no human touches production manually.
How you ship a new version to users is a first-class engineering decision — the wrong choice turns a small bug into a site-wide outage. There are four strategies. The first three (Rolling, Blue/Green, Canary) are infrastructure-level: they control how traffic is routed between old and new binaries. The fourth (Feature Flags) is application-level: the code ships dark and a runtime toggle controls who sees the behaviour.
K8s replaces old pods with new ones incrementally. New pods must pass readiness probes before old ones are terminated, so capacity is maintained throughout.
Rollback: kubectl rollout undo — ~2–5 min. During rollout both versions serve traffic simultaneously.
Use when: low-risk changes, internal tools, fixes. Zero extra infra cost.
Run two identical environments: blue (live) and green (new). Deploy to green, run full smoke tests, then flip the load balancer or DNS to send all traffic to green in one atomic switch.
Rollback: Flip back to blue — instant, no re-deploy. Cost: 2× infra during the switchover window.
Use when: database migrations or API contracts that can't tolerate mixed versions coexisting.
Route a small slice of real traffic (e.g. 5%) to the new version; 95% stays on the old. Watch error rates, latency, and business metrics (CTR, booking conversion) on live dashboards. If the canary is healthy, ramp: 5% → 25% → 50% → 100%. If anything looks wrong, set the canary weight to 0% — most users never saw the problem.
Implemented at: the load balancer (nginx split_clients), Kubernetes (two Deployments + weighted Service), or CDN traffic-splitting rules (Cloudflare, CloudFront weighted origins).
Use when: ranking algorithms, pricing logic, new payment flows — any change where a production regression would have direct business impact.
Ship the new code dark — it exists in the binary but is gated behind a runtime toggle. The flag is evaluated at request time (LaunchDarkly, Unleash, Growthbook, or a simple DB-backed config). When you're ready to release, flip the flag: no redeploy, no infra change, instant effect.
Rollback: Flip the flag off — instant, zero downtime, no K8s involved.
Targeting: flags can be scoped to % of users, specific user IDs, regions, or device types — finer-grained than infra-level canaries.
Use when: new UI features, A/B experiments, kill switches for risky behaviour. Often combined with canary: the canary controls who gets the new binary; the flag controls which behaviour they see inside it.
Key distinction: A canary is an infrastructure routing decision. A feature flag is an application code decision. They are complementary, not interchangeable.
| Strategy | Rollback | Who sees the bug | Extra cost | Layer |
|---|---|---|---|---|
| Rolling | kubectl rollout undo ~2–5 min | All users during rollout | None | Infrastructure |
| Blue/Green | Flip LB/DNS — instant | All users at the switch moment | 2× infra briefly | Infrastructure |
| Canary | Set weight → 0% | Canary slice only (e.g. 5%) | Slight (extra Deployment) | Infrastructure |
| Feature flag | Flip flag off — instant | Only flagged users | None (app code) | Application |
Deploy is not release. Rolling ships the binary; a feature flag releases the behaviour. Using both together means you can ship every day and release when the product is ready.
When The interviewer asks "how would you design the frontend platform to handle 10× our current traffic?", walk through the layers:
kubectl); K8s runs rolling deploys and HPA auto-scaling. Four tools, four cadences, one automated pipeline.Start at the edge and work inward. The bottleneck is rarely where you think — profile first, then add the right layer of capacity.
Concept: Docker packages the app, Kubernetes orchestrates containers at scale (health checks, scaling, zero-downtime deploys), and deployment strategy controls how each version reaches users — rolling by default, canary for risky changes, blue/green for instant rollback, feature flags to decouple deploy from release. Trade-off: Kubernetes brings enormous operational complexity — managed services (EKS, GKE) reduce the ops burden but not the abstraction complexity, and canary + flags only stay safe with real monitoring (without dashboards they give false confidence). Anchor: "We containerised the SSR app and BFF as separate images; GitLab CI pushed each on merge and ArgoCD synced the cluster from a manifests repo — no engineer had kubectl access to prod, the BFF scaled independently under search-heavy traffic, and we shipped a new ranking algorithm via a 5% canary watched on Grafana for 30 minutes before ramping to 100%." Impact: zero-downtime rolling deploys eliminated maintenance windows, and canary caught a pricing regression at 3% exposure instead of 100% — saving a significant number of bookings. Invite: "Are you on managed K8s today, and how does your team handle the deploy-vs-release split — feature flags, canary, or both?"
0 / 7 correct
1. the hotel search sees session data stored in-memory on each app server. During a promotion, the load balancer sends the same user to three different servers — the cart empties randomly. What is the right fix?
2. You need to cache hotel availability data that changes every 30 seconds. Which cache strategy fits best?
Availability is expensive to compute (it aggregates 12 microservices). Showing 30-second-stale data is acceptable to the product team.
3. Your SSR Next.js app is under heavy load during a flash sale. CPU on all 10 servers hits 90%. What is the fastest path to restoring capacity?
4. A new pricing microservice sometimes returns 503 during peak load. How should the FE platform respond?
5. You push a new Docker image to Kubernetes with a critical bug. Users are seeing 500 errors. What does K8s offer to recover quickly?
6. Your team is shipping a new hotel search ranking algorithm. The PM wants to catch any revenue regression before the change reaches all users. The algorithm is baked into the server binary and you have real-time CTR and conversion dashboards. Which deployment strategy fits best?
Rolling replaces all pods. Blue/Green switches everyone at once. Canary splits live traffic by percentage. Feature flags gate behaviour inside the binary.
7. Your platform uses Terraform, Docker, and Kubernetes. A developer wants to ship a new version of the SSR app — just an app code change, no infra change. Which tool drives that release, and why?
The EKS cluster, VPC, and database already exist. Only the application code changed.
Explain in 90 seconds: "How would you design the infrastructure for a hotel search page that serves 1 million concurrent users across Asia?" Cover: CDN, load balancer algorithm, caching layer, scaling approach, and one graceful-degradation decision.
Good follow-up topics: