Multiplexing, head-of-line blocking, QUIC over UDP, compression, and flattening request waterfalls
— the three-generation story of what changed and why it matters on the platform's mobile traffic.
1The one model: one bottleneck fixed per generation
Every HTTP version runs the same bargain — client sends requests, server returns responses — over a transport
layer. What changed across three generations is which bottleneck each version fixed, and at which
layer.
Version
Year
Transport
Key fix
Remaining problem
HTTP/1.1
1997
TCP
Persistent connections, pipelining (broken in practice)
One request in flight per connection → 6 connections per origin as workaround
HTTP/2
2015
TCP
Binary framing + multiplexing → many streams on one connection
TCP head-of-line blocking: one dropped packet freezes ALL streams
HTTP/3
2022
QUIC/UDP
Per-stream reliability → lost packet only affects that stream
QUIC CPU cost; UDP firewalls; observability harder
One-liner
"HTTP/1 had head-of-line blocking at the HTTP layer, HTTP/2 fixed that but not TCP's, HTTP/3 fixed TCP's by
replacing it with QUIC. Each version solves a different layer of the same problem."
2HTTP/1.1 — one request at a time & the workarounds
HTTP/1.1 is a text protocol over TCP. The core constraint: each connection handles one request
at a time. HTTP pipelining (send multiple requests without waiting for responses) exists in the spec but is broken
in practice — responses must arrive in order, so one slow response blocks everything behind it.
How browsers worked around it
Browsers open ~6 TCP connections per origin. More connections = more parallel slots. This
spawned the anti-pattern of domain sharding: splitting assets across
static1.cdn.com, static2.cdn.com to get 12–18 parallel connection slots. It works, but
costs one DNS lookup + TCP handshake + TLS handshake per shard domain.
HTTP/1.1 workarounds that become anti-patterns under HTTP/2:
Domain sharding — actively harmful
under H2: fragments one H2 connection into many TCP connections, defeats multiplexing
JS/CSS concatenation — bundling
everything into one file to save roundtrips; under H2, many small files with long cache TTLs beat one big file
you must fully invalidate
Image sprites / data URIs — same logic
as concatenation; H2 parallelism makes them unnecessary
3HTTP/2 — binary, multiplexed, still on TCP
HTTP/2 replaced the text protocol with a binary framing layer. Requests and responses are split
into frames that interleave on a single TCP connection via independent
streams. This is multiplexing: one connection, many parallel in-flight requests, no HTTP-layer
head-of-line blocking.
What HTTP/2 delivers
Multiplexing
Core win
Many request/response pairs share one TCP connection, interleaved as frames. Eliminates HTTP-layer HoL
blocking. Domain sharding becomes actively harmful.
HPACK header compression
Bandwidth
Static table of 61 common headers + dynamic table built during the connection. Repeated headers (Cookie,
User-Agent) sent as 1–4 bytes instead of hundreds. Critical for HTTP APIs with large cookies.
Stream priorities
Control
Client can signal relative priority of streams. Critical CSS/JS can be weighted higher. In practice, browser
resource hints (L01) are more reliable than H2 priority negotiation.
Server Push (deprecated 2022)
Gone
Server pre-sent resources before client asked. Fatal flaw: browsers couldn't check cache first → pushed
content already in disk cache = wasted bandwidth. Replaced by 103 Early Hints.
The problem HTTP/2 didn't solve: TCP head-of-line blocking
TCP guarantees ordered, reliable byte delivery. If one packet is lost, TCP waits for
retransmission before delivering any subsequent bytes — even bytes from completely unrelated streams. Because
HTTP/2 puts all streams on one TCP connection, a single dropped packet blocks every stream
simultaneously. On a lossy mobile network (1% packet loss is common across SEA), this can make H2 worse
than H1.1's parallel connections.
Head-of-line blocking under packet loss — click to simulate
H1.1
H2
H3
Stream A (CSS)
Stream B (JS)
Stream C (image)
Blocked / retransmit
4HTTP/3 & QUIC — transport layer redesign
HTTP/3 (RFC 9114, 2022) solves TCP head-of-line blocking by replacing TCP entirely with QUIC — a
multiplexed, encrypted transport protocol built on UDP. Each stream is tracked independently: a
lost packet triggers retransmission only for that stream, while other streams keep flowing.
Per-stream reliability
Core fix
QUIC implements reliability and ordering per-stream in userspace, not globally. Drop a packet on stream B →
only B waits. A and C flow uninterrupted. This is what kills TCP HoL blocking.
0-RTT connection resumption
Latency
Returning visitors send data in the first packet (0-RTT). TCP + TLS 1.2 needed 3 round trips; TLS
1.3 = 1 RTT; QUIC 1-RTT new + 0-RTT returning. Significant on high-latency SEA links.
Connection migration
Mobile
Connections identified by a Connection ID, not IP:port. Wi-Fi → 4G switch = same Connection
ID, seamless handoff. No reconnect, no dropped booking form. Big win for the platform mobile users.
Built-in TLS 1.3
Security
QUIC mandates TLS 1.3. The QUIC handshake and TLS are unified — no separate TLS negotiation. Encrypted by
default, including packet headers (makes DPI harder → QUIC-aware tooling needed).
QPACK — header compression for out-of-order delivery
HTTP/2's HPACK required headers to be decoded in order (another form of HoL blocking at the header layer). HTTP/3
uses QPACK: the same static + dynamic table approach, but adapted so headers referencing the
dynamic table can be decoded even if earlier stream frames haven't arrived yet.
HTTP/3 trade-offs — the honest answer under push-back:
UDP firewalls — some corporate/ISP
infrastructure blocks UDP port 443. H3 falls back to H2 via the Alt-Svc header
(Alt-Svc: h3=":443"; ma=86400). Browsers retry over TCP automatically. ~30% of connections still
fall back.
QUIC CPU overhead — reliability +
congestion + encryption in userspace is heavier than kernel TCP. At the origin server scale, measure before
adopting; CDN-terminated QUIC sidesteps this.
0-RTT replay risk — 0-RTT early data
is replayable by attackers. Safe for idempotent GETs only; never for booking POSTs or mutations.
HTTP compression is negotiated via Accept-Encoding (client lists what it accepts) and
Content-Encoding (server declares what it applied). Two algorithms dominate in 2024:
Algorithm
Ratio vs gzip
Compress speed
Decompress
Requires
gzip (level 6)
Baseline
Fast — viable for dynamic
Fast
Nothing; universal support
Brotli (level 11)
~20–26% smaller for text
~100× slower than gzip
Similar to gzip
HTTPS; all modern browsers
Brotli (level 4–5)
~10–15% smaller for text
~2–3× slower than gzip
Fast
HTTPS; viable for dynamic
The strategy
Static assets (JS, CSS, HTML): pre-compress with Brotli level 11 at build/deploy. Store the
.br file alongside the original on the CDN. Serve with Content-Encoding: br to browsers
that send Accept-Encoding: br. Dynamic SSR responses: gzip level 6 (good ratio, fast
enough CPU); offer Brotli level 4–5 only if you have headroom. Never compress already-compressed
formats — JPEG, WebP, AVIF, woff2, zip will grow, not shrink.
6Request waterfalls — and how to flatten them
A waterfall is a cascade of requests where each step must complete before the next begins. The
classic LCP waterfall on a client-rendered app:
// ❌ Classic useEffect waterfall — each step adds 1+ round trips
HTML arrives
→ browser discovers JS bundle → downloads + executes
→ useEffect fires → fetch("/api/hotels")
→ response arrives → React renders hotel cards
→ hero image discovered → starts downloading
→ LCP fires ← 800–1200ms later on a slow link
Each arrow is a minimum of one round trip — on a 100ms RTT link (Bangkok → Singapore), that's 400ms of purely
sequential waiting. The fix is not faster HTTP — it's making the steps parallel or eliminating them.
Colocate data with SSR
Biggest win
Fetch hotel data on the server during rendering. Data arrives inside the initial HTML response — zero client
waterfall for critical content. RSC and Next.js route handlers make this ergonomic.
103 Early Hints
SSR + CDN
Server sends preload hints before the full 200 response is ready. Browser starts fetching CSS/fonts while SSR
runs the hotel query. Cuts SSR processing time off the critical path for subresources.
Preconnect / preload
Resource hints
<link rel=preconnect> to API origin — DNS + TCP + TLS done before JS fires the fetch.
rel=preload for the LCP hero image so it starts downloading immediately from HTML. (L01.)
Parallel queries
Data fetching
Promise.all([fetchHotelInfo(), fetchAvailability()]) instead of sequential awaits. RSC parallel
routes and sibling <Suspense> boundaries fetch in parallel by default.
CDN edge caching
ISR pages
Serve pre-rendered or ISR-revalidated HTML from a CDN edge node near the user. Eliminates the origin round
trip for cache hits — TTFB drops from hundreds of ms to single digits.
Speculative prefetch
Navigation
Speculation Rules API or rel=prefetch preloads the next page while the user reads the current
one. Converts cold navigation to a cache hit — zero waterfall on click. (L01.)
7Platform whiteboard — the Lead answer
If asked "what's your network/HTTP strategy for a global travel platform?", structure the answer in three layers:
Protocol: HTTP/2 is table stakes — CDN support is universal, enable it. Layer HTTP/3 at the
CDN edge (Cloudflare, Akamai support it) with automatic H2 fallback via Alt-Svc. No code change on
the app. H3 matters most for SEA mobile users — high packet loss + connection migration during booking flows.
Compression: Pre-compressed Brotli (level 11) for all static assets in the CDN deploy
pipeline. gzip level 6 for dynamic SSR. Never compress images or fonts — already binary-compressed.
Waterfalls: Hotel search results and hotel detail data colocated with SSR — no client fetch
waterfall for critical content. 103 Early Hints for CSS/fonts from the CDN layer. Parallel queries with
Promise.all / RSC parallel routes. Preconnect to API and CDN origins in <head>.
Edge-cached ISR pages for hotel static content (description, photos). Speculation Rules for anticipated
navigations (search results → hotel detail).
One-liner
"Enable H3 at the CDN edge — it's a flag flip, not a rewrite, CDN handles the H2 fallback. The bigger
architectural win is eliminating the data waterfall: move API fetches into SSR so critical data arrives with the
HTML."
Full loop
Concept: each HTTP generation fixes one bottleneck — H/1.1 added keep-alive but serialized requests, H/2 multiplexed over one TCP connection, H/3 moved to QUIC to kill TCP head-of-line blocking; Brotli then shaves 20–26% off text on top. Trade-off: H/3 needs UDP + edge support and Brotli level 11 is ~100× slower than gzip (build-time only) — so gzip stays the safer default on dynamic paths, and the real waterfall fix is architectural, not protocol choice. Anchor: "We flipped on H/3 at the CDN edge and added pre-compressed Brotli to the deploy pipeline — JS bundles dropped ~24% with no code change — but the bigger win was moving API fetches into SSR so critical data arrived with the HTML." Impact: fewer round trips + fewer bytes → faster parse/execute → lower LCP and INP, especially for users on slow connections in SEA markets. Invite: "I'd revisit dynamic Brotli and the SSR data strategy if we moved rendering to Edge Functions — the edge CPU cost model is different from an origin fleet at scale."
8Check yourself — scenario quiz
0 / 8 answered
1. A PM says "we've upgraded to HTTP/2 — we can remove domain sharding now, right?" What's the
Lead answer?
2. An engineer says "HTTP/2 multiplexing means we've eliminated head-of-line blocking." Is that
correct?
3. Why did Chrome deprecate HTTP/2 Server Push in 2022?
4. A user in Bangkok is booking a hotel. Halfway through the checkout form, their phone switches
from Wi-Fi to 4G. Under HTTP/2 the connection drops. What does HTTP/3 do differently?
The checkout POST has not been submitted yet — they're still filling in the form.
5. Your team enables Brotli compression at level 11 for dynamic SSR API responses. Load tests
show origin CPU spikes 40%. What happened and what's the fix?
6. A senior engineer reviews your Next.js hotel detail page and says the LCP (hero image) fires
900ms after HTML. DevTools shows a waterfall: HTML→JS bundle→useEffect fetch→render→hero image. What's the
architectural fix?
The hotel data fetch in useEffect calls your internal API to get hotel photos, name, and
description.
7. You're asked "should we move to HTTP/3 for our platform?" as a Lead, what's your answer?
8. A new engineer proposes solving your LCP waterfall by switching from gzip to Brotli and
upgrading to HTTP/3. Your response as Lead?
The actual root cause is hotel search data fetched in a client-side useEffect, adding two round
trips to the critical path.
Out-loud drill — do this before next session
Say this out loud without notes: "Walk me through HTTP/1 vs 2 vs 3 — what problem does each fix, and which
matters most for the platform?"
Target: 60 seconds. Hit these: HTTP-layer HoL
blocking (H1→H2), TCP-layer HoL blocking (H2→H3/QUIC), connection migration for mobile, Brotli pre-compress
strategy, and the waterfall architectural fix being SSR data colocation, not protocol choice.
Good follow-up topics:
How does QUIC congestion control differ from TCP's?Show me a real HPACK encoding exampleHow do I verify a site is using HTTP/3 in DevTools?What is the Alt-Svc header exactly?How does 0-RTT resumption work mechanically?How do I read a DevTools waterfall to find the root cause?Can Brotli be used with HTTP/1.1?