Scalable Web Architecture

Load balancing, caching layers, scaling strategies, microservices, and containers — the system design vocabulary a FE Lead needs to own the technical interview.

1 The big picture: request path from browser to DB

Every user action is a request that travels through layers — and each layer is a scaling knob. Step through the animation to see a hotel search request travel from browser to database and back.

Important: not every request travels all the way down. Static assets (JS/CSS/images) are served by the CDN and never reach your servers. Write operations (POST/PUT/DELETE) skip Redis entirely. Redis sits only on the cacheable read hot path. The WAF screens all inbound traffic before it touches your infrastructure.

Scenario:

0 / 8

🌐

Browser

Chrome · Safari · Firefox

HTTP/2 request

🛡

WAF

AWS WAF · Cloudflare Rules · Imperva

clean traffic only

☁

CDN Edge

Cloudflare · CloudFront · Akamai

← static miss →

📦

Origin Store

S3 · GCS · R2 · nginx /public

cache miss only

⚖

Load Balancer

L7 · nginx · AWS ALB

routes to healthy instance

App 1

Node / Next.js

App 2

Node / Next.js

App 3

Node / Next.js

hot-path reads

⚡

Redis Cache

sub-ms · TTL · pub/sub

cache miss

🗄

Primary DB

+ read replicas

Press ▶ Play to animate a full cache-miss request, or use Next › to step through manually.

One-liner

Scalability isn't one thing — it's a series of layers. I start at the edge (CDN) and work inward, adding capacity at whichever layer the bottleneck lives.

2 Load balancing

A load balancer distributes incoming requests across a pool of servers. Layer 4 (TCP/IP) LBs route by IP/port; Layer 7 LBs (the type FE leads care about) understand HTTP — they can route by URL path, header, cookie, or body content.

Algorithm selection

Round Robin

Sends each request to the next server in sequence. Default choice when all servers are identical and requests are short-lived (e.g. REST APIs). Simple, even distribution, zero state. Breaks if servers have different capacity.

Least Connection

Routes to the server with the fewest active connections. Better for long-lived or variable-cost requests (file uploads, SSR pages, WebSockets). The LB must track connection state. Slight overhead; still ignores response time.

IP Hashing

hash(client_ip) % server_count — the same client always hits the same server. Needed for stateful sessions stored in memory (e.g. WebSocket rooms). Kills even distribution if one subnet floods the pool; also breaks if a server goes down and the hash shifts.

Weighted Round Robin

Like round robin but servers get proportionally more traffic based on their capacity. Use when you have heterogeneous instance sizes in the pool.

Interactive — LB algorithm simulator

⚖ Load Balancer

Server 1

0 req

0 active

idle

Server 2

0 req

0 active

idle

Server 3

0 req

0 active

idle

Choose an algorithm above and send requests to see the distribution.

Health checks

The LB periodically sends a probe (HTTP GET /health or TCP ping) to each server. If a server fails N consecutive checks, it is removed from rotation. Key config knobs: Interval (how often to probe), Threshold (consecutive failures before marking unhealthy), Timeout (how long to wait). A well-designed /health endpoint checks DB connectivity and cache reachability — not just HTTP 200. A server that can't reach its DB should return 503 so it is pulled immediately.

Auto-scaling

Horizontal auto-scaling (AWS Auto Scaling Groups, GCP MIGs, Kubernetes HPA) adds or removes instances based on a signal — typically CPU %, request queue depth, or custom metrics. The LB's server pool updates automatically. For platform traffic spikes (flash sales, promotions), set a minimum fleet for baseline, a maximum ceiling for cost control, and scale-out triggers so the fleet grows before the spike hits, not after.

Round robin trap: If your sessions are stored server-side (in-memory), round robin sends the same user to different servers — session is lost. Fix: use a shared session store (Redis) or switch to IP hashing. Better yet: make sessions stateless (JWT) so any server can handle any request.

One-liner

I default to round robin + a Redis session store. That keeps every server stateless and lets the LB freely balance without sticky sessions.

3 Caching layers

Caching is the highest-leverage performance and scale tool available. Each layer has a different hit rate, latency, and invalidation mechanism.

Browser cache

Zero network cost. Controlled by Cache-Control, ETag, Last-Modified. Hashed assets → max-age=31536000, immutable. HTML → no-cache so the browser always revalidates.

CDN edge

Serves from a PoP near the user. Cloudflare, CloudFront, Akamai. Caches static assets and full HTML pages. Dramatically reduces latency for global users. Purge via API on deploy.

Application cache (Redis / Memcached)

In-memory store shared across all app servers. Cache DB query results, computed aggregates, API responses. Sub-millisecond reads. Redis also supports TTL, pub/sub, sorted sets, and persistence.

DB read replicas

Read replicas offload SELECT traffic from the primary. The primary handles writes; replicas serve reads. Slight replication lag is acceptable for read-heavy workloads.

CDN providers at a glance

Provider	Best for	Key differentiator
Cloudflare	Global sites, DDoS protection, edge compute	Largest PoP network (~300+), free tier, Workers for edge logic
AWS CloudFront	AWS-native apps	Deep AWS integration (S3, ALB, Lambda@Edge)
Akamai	Enterprise, media streaming, large e-commerce	Oldest, widest enterprise features, strong SLA guarantees

How the CDN decides: static vs dynamic

The CDN evaluates three signals in order to decide whether to cache a response:

#	Signal	Example	Notes
1	Cache-Control header from origin	`max-age=31536000, immutable` → cache 1 yr `no-cache` / `no-store` → bypass	Most authoritative. Your S3 bucket or app server sets this on every response.
2	CDN rules / Cache Behaviors	Cloudflare Page Rule: "Cache everything at `/assets/`" CloudFront Cache Behavior: bypass cache for `/api/`	Override or supplement headers. Lets you cache responses even if the origin doesn't send Cache-Control.
3	HTTP method	GET / HEAD → cacheable POST / PUT / DELETE → always bypassed	CDNs never cache write operations — they are inherently non-idempotent.

Static asset on CDN miss → origin object store, not the DB. When a CDN misses on app.3f7a1c.js, it fetches from its configured origin — an S3 bucket (or GCS, R2, nginx /public) where your CI/CD pipeline uploaded the build output. The Load Balancer, App Servers, Redis, and database are never involved. The DB doesn't even know static assets exist; they are pre-compiled files that live in object storage.

Redis vs Memcached

	Redis	Memcached
Data types	Strings, lists, sets, sorted sets, hashes, streams	Strings only
Persistence	RDB snapshots + AOF log	None (volatile)
Pub/Sub	Yes	No
Clustering	Redis Cluster (native sharding)	Client-side sharding
Use Redis when	You need rich data types, persistence, or pub/sub. Default choice for new projects.
Use Memcached when	Pure key-value cache, very simple needs, maximum simplicity.

Cache invalidation is the hard part. TTL-based expiry is safe but stale. Event-driven invalidation (delete on write) is fresh but complex. Write-through is consistent but adds write latency. Define your staleness tolerance first, then pick the strategy.

One-liner

Caching is a contract: you trade consistency for speed. I always define the staleness budget before choosing a cache strategy — "5 seconds stale is fine" changes the design completely.

4 Horizontal vs vertical scaling

Vertical scaling (scale up)

When to use

Make one machine bigger — more CPU, RAM, faster disk. Simple: no code changes, no coordination. Good for monoliths and databases that are hard to shard.

Ceiling: Physical/cost limit. Single point of failure. Hard to scale down.

Horizontal scaling (scale out)

When to use

Add more machines behind a load balancer. Each node is smaller and cheap. Fault-tolerant by default — one node fails, others absorb traffic.

Requirement: App must be stateless. Shared state must live in an external store (Redis).

Vertical — Scale Up

2 CPU
4 GB

1 server · $

Horizontal — Scale Out

App 1

1 server · $

Modern cloud architecture defaults to horizontal scaling for app servers and uses vertical scaling selectively for databases (until the DB itself needs to be sharded or replicated).

One-liner

Horizontal scaling is the default for app servers. The constraint is always "make the app stateless first" — once state is in Redis, you can add servers freely.

5 Microservices

A microservice architecture splits a monolith into small, independently deployable services — each owning its data, its deploy cycle, and its scaling policy.

Why teams adopt them

Benefits

Independent deploy — teams don't step on each other
Independent scaling — scale only the hot service
Fault isolation — one service crash doesn't take everything down
Tech heterogeneity — right language per service

Real costs

Complexity you inherit

Network calls replace function calls — latency, timeouts, retries
Distributed tracing required (no single log stream)
Data consistency — no ACID across services
Service discovery and API versioning overhead
Much harder to test end-to-end

Microservices and the FE Lead

BFF (Backend for Frontend): A dedicated API gateway that aggregates multiple microservices into a single response shaped for your UI. Eliminates waterfall fetches and over-fetching on the client. → Full deep-dive in Lesson 25.
API versioning contracts: You negotiate the contract with each service team. Breaking changes require a version bump, not a surprise deploy.
Resilience: Your UI must handle partial failures gracefully — show the hotel card without pricing if the price service is down, rather than a blank page.

Microservices are not the starting point. Start with a monolith, identify the seams that need independent scaling or team ownership, then extract. Premature microservices multiply operational burden before you have the team or traffic to justify it. Martin Fowler calls this the "microservices premium."

6 Docker and Kubernetes

Docker — what it is

Docker packages an app and all its dependencies into a container image — a portable, isolated unit that runs identically on any machine with a Docker runtime.

Image: Immutable snapshot of your app + runtime + dependencies (defined by a Dockerfile).
Container: A running instance of an image. Isolated file system, network, process space. Starts in seconds.
Registry: Where images are stored and versioned — DockerHub, AWS ECR, GCP Artifact Registry. A deploy is "pull image tag X and run it."

# Minimal Node.js Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]

Kubernetes (K8s) — what it is

Kubernetes is a container orchestrator. Given a fleet of machines (nodes), it decides which containers run where, restarts crashed containers, scales replicas up/down, rolls out new versions, and routes traffic. You describe desired state in YAML; Kubernetes makes it real.

Key K8s concepts

Concept	What it is	Why it matters for FE Lead
Pod	Smallest deployable unit — one or more containers sharing a network namespace	Your Next.js SSR server runs as a pod
Deployment	Declares desired replica count and image version; manages rolling updates	Zero-downtime deploys: new pods start before old ones stop
Service	Stable virtual IP that routes to healthy pods	Decouples pod IPs (ephemeral) from the address other services call
Ingress	HTTP router at the cluster edge — routes by hostname/path to Services	Routes `/api/*` to backend service, `/` to SSR service
HPA	Horizontal Pod Autoscaler — scales replica count based on CPU or custom metrics	Handles flash-sale spikes automatically
ConfigMap / Secret	Inject env vars and secrets without baking them into the image	Keeps images portable; rotate secrets without rebuilding

When you push a new Docker image, Kubernetes performs a rolling update by default: it gradually replaces old pods with new ones. Combined with readiness probes, this gives you zero-downtime deploys.

K8s complexity is real. Use a managed service (AWS EKS, GCP GKE) rather than running your own cluster. As a Lead, you need to understand the concepts, not operate the cluster.

7 How it all fits: Terraform, Docker, Kubernetes

Interviewers love this one because juniors treat the three as interchangeable buzzwords. They aren't competitors — they sit at three different layers and answer three different questions. Get the layering right and you sound like someone who has actually shipped.

Terraform

Provisioning — builds the world to run on

"Build me the infrastructure." Declarative Infrastructure-as-Code: VPC, subnets, the EKS/GKE cluster itself, node pools, RDS, load balancer, DNS, IAM. terraform apply reconciles real cloud state to your .tf files.

Changes rarely — weeks/months.

Docker

Packaging — the standardized box

"Package my app so it runs identically anywhere." An immutable image: app + runtime + deps. CI builds it, tags it by commit SHA, pushes to a registry (ECR / Artifact Registry).

Changes every code change.

Kubernetes

Orchestration — runs the boxes

"Run, scale, heal, and route these packages across machines." Takes the Docker image and the nodes Terraform created, schedules pods, restarts crashed ones, autoscales (HPA), and does rolling deploys. Changes every deploy.

CI/CD and GitOps — the glue layer

Terraform, Docker, and Kubernetes are the what. CI/CD pipelines and GitOps tools are the how — the automation that moves a code change through each layer without a human running commands.

CI pipeline (GitLab CI / GitHub Actions / Jenkins)

Triggered on every git push

Runs on the code side: lint → test → docker build → tag image with commit SHA → push to registry (ECR / GCR / Artifact Registry). The pipeline's job ends when a verified, immutable image exists in the registry.

GitLab CI example: .gitlab-ci.yml defines stages; each stage runs in its own Docker container on a GitLab Runner.

CD / GitOps (ArgoCD / Flux)

Triggered on manifest change in Git

Runs on the cluster side: watches a Git repo of K8s manifests (Helm charts or raw YAML). When the manifest is updated (e.g. image tag bumped), ArgoCD syncs the cluster to match — no human runs kubectl apply. Git becomes the single source of truth for cluster state.

ArgoCD: cluster pulls from Git (pull model) — more secure than CI pushing into the cluster.

The full lifecycle (this is what makes it click)

Terraform provisions the world — VPC, subnets, EKS/GKE cluster, node pools, RDS, LB, DNS, IAM. Runs rarely (weeks/months). Managed via its own CI pipeline.
Developer pushes code → CI pipeline fires — GitLab CI / GitHub Actions runs tests, then docker build -t registry/ssr:$SHA . and pushes the image. Pipeline then bumps the image tag in the K8s manifest repo (a PR or a direct commit to a deploy branch).
ArgoCD / Flux detects the manifest change — syncs the cluster: tells K8s to run the new image tag. K8s performs a rolling update, replacing old pods with new ones that pass readiness probes.
Kubernetes runs and heals — HPA scales replicas based on CPU/metrics; K8s restarts crashed pods; Services route traffic only to healthy pods.

# 1. Terraform — provision the cluster (runs rarely)
resource "aws_eks_cluster" "main" { name = "prod-cluster" }
# → terraform apply  VPC + EKS + RDS + LB

# 2. GitLab CI — build + push on every commit
# docker build -t ecr.../ssr:$CI_COMMIT_SHA .
# docker push ecr.../ssr:$CI_COMMIT_SHA
# sed -i "s|image:.*|image: ecr.../ssr:$CI_COMMIT_SHA|" k8s/deployment.yaml
# git commit -m "deploy: bump ssr to $CI_COMMIT_SHA"

# 3. ArgoCD — detects manifest change, syncs cluster
# (no human runs kubectl — ArgoCD pulls from Git and applies)

# 4. Kubernetes — rolls out new pods, heals, autoscales
# kubectl rollout status deployment/ssr  ← CI can poll this for status

Tool	Question it answers	Layer	Cadence
Terraform	"Build me the infra to run on"	Provisioning (cloud)	Rarely
GitLab CI / GitHub Actions	"Test, build, and package my code"	CI — build pipeline	Every commit
Docker	"Package my app to run anywhere"	Packaging (artifact)	Every commit
ArgoCD / Flux	"Keep the cluster in sync with Git"	CD — GitOps sync	Every manifest change
Kubernetes	"Run/scale/heal these packages"	Orchestration (runtime)	Every deploy

Push-based CD vs GitOps pull model. Traditional CD pipelines push into the cluster (kubectl apply from CI). GitOps flips it: ArgoCD/Flux pulls from Git and applies. Pull is more secure (CI never holds cluster credentials) and gives you drift detection — ArgoCD alerts if someone manually edits the cluster without updating Git. The tradeoff: GitOps adds a second repo and sync loop to maintain.

One-liner

Terraform builds the world. CI packages the code. GitOps syncs the cluster. Kubernetes runs it. Four tools, four questions, one pipeline — no human touches production manually.

8 Deployment strategies

How you ship a new version to users is a first-class engineering decision — the wrong choice turns a small bug into a site-wide outage. There are four strategies. The first three (Rolling, Blue/Green, Canary) are infrastructure-level: they control how traffic is routed between old and new binaries. The fourth (Feature Flags) is application-level: the code ships dark and a runtime toggle controls who sees the behaviour.

Infrastructure-level strategies

1. Rolling update

Default — most deploys

K8s replaces old pods with new ones incrementally. New pods must pass readiness probes before old ones are terminated, so capacity is maintained throughout.

Rollback: kubectl rollout undo — ~2–5 min. During rollout both versions serve traffic simultaneously.

Use when: low-risk changes, internal tools, fixes. Zero extra infra cost.

2. Blue/Green

When instant rollback is non-negotiable

Run two identical environments: blue (live) and green (new). Deploy to green, run full smoke tests, then flip the load balancer or DNS to send all traffic to green in one atomic switch.

Rollback: Flip back to blue — instant, no re-deploy. Cost: 2× infra during the switchover window.

Use when: database migrations or API contracts that can't tolerate mixed versions coexisting.

3. Canary

When you need to validate real-user impact before full rollout

Route a small slice of real traffic (e.g. 5%) to the new version; 95% stays on the old. Watch error rates, latency, and business metrics (CTR, booking conversion) on live dashboards. If the canary is healthy, ramp: 5% → 25% → 50% → 100%. If anything looks wrong, set the canary weight to 0% — most users never saw the problem.

Implemented at: the load balancer (nginx split_clients), Kubernetes (two Deployments + weighted Service), or CDN traffic-splitting rules (Cloudflare, CloudFront weighted origins).

Use when: ranking algorithms, pricing logic, new payment flows — any change where a production regression would have direct business impact.

Application-level strategy

4. Feature flags

When you want to decouple deploy from release

Ship the new code dark — it exists in the binary but is gated behind a runtime toggle. The flag is evaluated at request time (LaunchDarkly, Unleash, Growthbook, or a simple DB-backed config). When you're ready to release, flip the flag: no redeploy, no infra change, instant effect.

Rollback: Flip the flag off — instant, zero downtime, no K8s involved.

Targeting: flags can be scoped to % of users, specific user IDs, regions, or device types — finer-grained than infra-level canaries.

Use when: new UI features, A/B experiments, kill switches for risky behaviour. Often combined with canary: the canary controls who gets the new binary; the flag controls which behaviour they see inside it.

Key distinction: A canary is an infrastructure routing decision. A feature flag is an application code decision. They are complementary, not interchangeable.

Comparison at a glance

Strategy	Rollback	Who sees the bug	Extra cost	Layer
Rolling	`kubectl rollout undo` ~2–5 min	All users during rollout	None	Infrastructure
Blue/Green	Flip LB/DNS — instant	All users at the switch moment	2× infra briefly	Infrastructure
Canary	Set weight → 0%	Canary slice only (e.g. 5%)	Slight (extra Deployment)	Infrastructure
Feature flag	Flip flag off — instant	Only flagged users	None (app code)	Application

Canary without monitoring is dangerous. If you can't observe error rates and conversion metrics in real time, a canary gives false confidence — you let bad traffic run longer than you should. Similarly, feature flags that are never cleaned up become permanent dead code branches that are hard to reason about. Both tools require discipline to use safely.

One-liner

Deploy is not release. Rolling ships the binary; a feature flag releases the behaviour. Using both together means you can ship every day and release when the product is ready.

9 Interview framing: the system design answer

When The interviewer asks "how would you design the frontend platform to handle 10× our current traffic?", walk through the layers:

Edge: Aggressive CDN caching for all static assets (immutable URLs), edge-cached API responses where staleness is acceptable.
Load balancing: L7 LB (ALB/nginx) with round-robin, health checks, and auto-scaling group. Stateless app servers.
Application caching: Redis for session state, computed page data, and hot API responses. Define TTLs per data type.
Horizontal scaling: App server pool grows automatically. Database read replicas absorb SELECT traffic.
Microservices / BFF: A BFF aggregates downstream APIs so the UI makes one request, not five.
Infra & containers: Terraform provisions the cluster/VPC/DB (rarely); GitLab CI builds and pushes Docker images on every commit; ArgoCD syncs K8s manifests from Git (no manual kubectl); K8s runs rolling deploys and HPA auto-scaling. Four tools, four cadences, one automated pipeline.
Deployment strategy: Rolling for most deploys; canary for high-risk changes (ranking, pricing); blue/green when instant rollback is non-negotiable; feature flags to decouple deploy from release — ship dark, flip when ready.

One-liner

Start at the edge and work inward. The bottleneck is rarely where you think — profile first, then add the right layer of capacity.

Full loop

Concept: Docker packages the app, Kubernetes orchestrates containers at scale (health checks, scaling, zero-downtime deploys), and deployment strategy controls how each version reaches users — rolling by default, canary for risky changes, blue/green for instant rollback, feature flags to decouple deploy from release. Trade-off: Kubernetes brings enormous operational complexity — managed services (EKS, GKE) reduce the ops burden but not the abstraction complexity, and canary + flags only stay safe with real monitoring (without dashboards they give false confidence). Anchor: "We containerised the SSR app and BFF as separate images; GitLab CI pushed each on merge and ArgoCD synced the cluster from a manifests repo — no engineer had kubectl access to prod, the BFF scaled independently under search-heavy traffic, and we shipped a new ranking algorithm via a 5% canary watched on Grafana for 30 minutes before ramping to 100%." Impact: zero-downtime rolling deploys eliminated maintenance windows, and canary caught a pricing regression at 3% exposure instead of 100% — saving a significant number of bookings. Invite: "Are you on managed K8s today, and how does your team handle the deploy-vs-release split — feature flags, canary, or both?"

10Check yourself — scenario quiz

0 / 7 correct

1. the hotel search sees session data stored in-memory on each app server. During a promotion, the load balancer sends the same user to three different servers — the cart empties randomly. What is the right fix?

2. You need to cache hotel availability data that changes every 30 seconds. Which cache strategy fits best?

Availability is expensive to compute (it aggregates 12 microservices). Showing 30-second-stale data is acceptable to the product team.

3. Your SSR Next.js app is under heavy load during a flash sale. CPU on all 10 servers hits 90%. What is the fastest path to restoring capacity?

4. A new pricing microservice sometimes returns 503 during peak load. How should the FE platform respond?

5. You push a new Docker image to Kubernetes with a critical bug. Users are seeing 500 errors. What does K8s offer to recover quickly?

6. Your team is shipping a new hotel search ranking algorithm. The PM wants to catch any revenue regression before the change reaches all users. The algorithm is baked into the server binary and you have real-time CTR and conversion dashboards. Which deployment strategy fits best?

Rolling replaces all pods. Blue/Green switches everyone at once. Canary splits live traffic by percentage. Feature flags gate behaviour inside the binary.

7. Your platform uses Terraform, Docker, and Kubernetes. A developer wants to ship a new version of the SSR app — just an app code change, no infra change. Which tool drives that release, and why?

The EKS cluster, VPC, and database already exist. Only the application code changed.

Out-loud drill — do this before your interview

Explain in 90 seconds: "How would you design the infrastructure for a hotel search page that serves 1 million concurrent users across Asia?" Cover: CDN, load balancer algorithm, caching layer, scaling approach, and one graceful-degradation decision.

Good follow-up topics:

How does Cloudflare Workers differ from Lambda@Edge? What is a circuit breaker and when do I add one? Walk me through a Redis eviction policy decision How does K8s handle zero-downtime deploys exactly? When does a monolith beat microservices? What is the BFF pattern and when do I need one? How do I size an auto-scaling group for flash sales? How does canary work in Kubernetes — two Deployments or one? When would I choose blue/green over canary? What's the difference between a canary and a feature flag? How does ArgoCD differ from just running kubectl in CI? When would you use GitLab CI vs GitHub Actions vs Jenkins? How do feature flags get cleaned up — what's the governance? Where's the line between Terraform and Kubernetes? What is GitOps (ArgoCD/Flux) and where does it fit?