Diagnosing a Slow Page

The methodology — not a tool tour. How a Lead goes from “the page feels slow” to a named root cause: RUM to find it, lab to explain it, the DevTools flame chart, and a per-symptom playbook.

1The one rule: measure, don't guess

The single thing that separates a Lead answer from a junior one: you never optimize blind. You find where the time actually goes, fix that, and verify it moved. The loop:

RUM first — find the problem. Field data (real users, p75) tells you which metric, which page, and which segment (device, country, network) is slow. This is where you decide what's worth fixing.
Reproduce in the lab. Open DevTools, throttle to match the slow segment (mid-tier mobile CPU 4×, slow 4G), and record. Lab is reproducible and detailed — it's where you find why.
Localize to one metric/phase. Is it LCP (loading) or INP (responsiveness)? Then break that metric into its sub-parts (below) to point at one culprit.
Fix the root cause, then verify. Re-measure in lab to confirm, then watch RUM to confirm it moved for real users — not just your machine.

One-liner

“RUM to find it, lab to explain it. Field data tells me which metric and which users are slow; the DevTools flame chart tells me why. I fix the root cause and verify it moved in the field — never optimize on my own fast laptop.”

2RUM vs lab — two jobs, not rivals

RUM / field

finds & prioritizes

Real users via PerformanceObserver / a RUM vendor / CrUX. Real devices, networks, geographies. Answers whether you have a problem and for whom. Can't step into a single session.

Lab / synthetic

explains & reproduces

Lighthouse + DevTools Performance panel, one controlled run. Flame charts, waterfalls, automated audits. Answers why. But one synthetic device ≠ your real long tail (and can't see real INP).

Use them as a relay: RUM hands the lab a target. Optimizing a lab number that no real user hits is the classic waste. [MDN]

3Reading the Performance panel

Record a load (or an interaction) and you get a flame chart of the main thread. Three things to know:

x-axis = time, y-axis = call stack. Events on top call the events below them. A wide block = something took a long time.
Long tasks = any main-thread task > 50ms. DevTools flags them with a red triangle and shades the part over 50ms red. Long tasks are the enemy of INP — while one runs, the page can't respond.
Tracks: the metrics row shows local LCP/CLS automatically, and INP once you interact; the Network track shows the request waterfall; the Main track is your JS.

main thread — flame chart (x = time →, y = call stack)

Long task ⚠ 180ms (red over 50ms)

task 40ms

idle

hydrate()

parseJSON

render

reconcile

Modern DevTools also has a Performance Insights view that auto-flags LCP sub-parts, render-blocking requests, third-parties, and duplicated JS — name it; it's the current workflow. [Chrome]

4The per-symptom playbook

Slow LCP → break it into 4 sub-parts

web.dev splits LCP into four phases — find which dominates, and the fix is obvious:

LCP sub-part	What it is	If it dominates → fix
TTFB	server + network to first byte	CDN, edge cache, faster backend (Lesson 03)
Resource load delay	gap before the LCP image starts downloading	it was discovered late → `preload` + `fetchpriority=high` (Lesson 01)
Resource load time	how long the LCP image takes to download	compress, AVIF/WebP, responsive `srcset`
Element render delay	downloaded but not yet painted	render-blocking CSS/JS → inline critical CSS, defer (Lesson 02)

Source: web.dev — Optimize LCP

Poor INP → record the interaction, split 3 phases

Interact while recording; DevTools breaks the slow interaction into input delay / processing / presentation (Lesson 03). Then:

Long input delay → the main thread was busy with a long task before your handler. Find that task in the flame chart, break it up / ship less JS.
Long processing → your event handler is heavy. Make it cheap, defer non-urgent work.
Long presentation → a giant re-render/commit. Virtualize, reduce DOM, content-visibility.

Other tools to name

Coverage tool — highlights unused JS/CSS shipped to the page → code-split / remove dead code.
Network waterfall — find request chains (A must finish before B is discovered) and flatten them with preconnect/preload; spot uncompressed or oversized assets.
Bundle analyzer (webpack-bundle-analyzer / source-map-explorer) — what's in the bundle, duplicate deps, a heavy lib to swap.
PerformanceObserver — the API your own RUM uses to capture LCP/INP/CLS/long-tasks from real users.

// minimal field measurement — feed your RUM
new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) sendToRUM(entry);
}).observe({ type: 'largest-contentful-paint', buffered: true });

But that snippet gives you raw entries, not the metric. In production you ship Google's web-vitals library (npm web-vitals, by the Chrome team) — it encodes the rules that match how CrUX measures, the very numbers you're graded on:

LCP = last entry before first interaction / page hidden — not just the last one you see.
CLS = the largest session window of shifts (max 5s, 1s gaps), not the sum of all shifts. Easy to get wrong by hand.
INP = worst interaction latency tracked across the whole visit.
All three are only final when the page is backgrounded, so it reports on visibilitychange/pagehide.

import { onLCP, onINP, onCLS } from 'web-vitals';

function sendToRUM({ name, value, rating, delta, id }) {
  // rating = 'good' | 'needs-improvement' | 'poor'  (the p75 thresholds)
  navigator.sendBeacon('/rum', JSON.stringify({ name, value, rating, delta, id }));
}
onLCP(sendToRUM); onINP(sendToRUM); onCLS(sendToRUM);

// the attribution build tells you WHY, in the field:
import { onINP } from 'web-vitals/attribution';
onINP(({ value, attribution }) => {
  // attribution.interactionTarget → the element the user hit
  // inputDelay / processingDuration / presentationDelay → which phase was slow
});

Trap — don't hand-roll it. Rolling your own PerformanceObserver RUM almost always mis-computes CLS session-windowing and INP, so your field numbers silently disagree with Google's CrUX. Ship web-vitals and your numbers match the ones you're ranked on — and the attribution build closes the “RUM to find it, lab to explain it” loop by pointing at the slow element/phase from the field itself.

One-liner

“I don't hand-roll PerformanceObserver — CLS windowing and INP are too easy to get wrong. I ship Google's web-vitals so my field numbers match CrUX, and attribution tells me which element to fix.”

Where the data lands — RUM vendors to name. The smart move isn't reciting brands, it's showing you know the categories and pick by constraint (cost, scale, privacy, build-vs-buy):

Free Google field data

the baseline everyone should know

CrUX — the dataset Search ranks on. Read it via PageSpeed Insights, the CrUX API, BigQuery, or the CrUX Dashboard (Looker Studio). Caveat: aggregated, 28-day rolling, origin/page-level — not per-session, so you can't debug one user. Free, zero-instrumentation.

Dedicated perf-RUM

when performance is the product

SpeedCurve (LUX), Akamai mPulse, DebugBear, Calibre, Raygun. Web-Vitals-first, p75 by route/segment, RUM + synthetic in one. Best signal-to-noise for a perf team.

APM / observability suites

when you already have one

Datadog RUM, New Relic Browser, Dynatrace, Sentry (Web Vitals + error/trace), Grafana Faro (open-source). RUM lives next to backend traces & errors — one pane, correlate front to back.

Roll-your-own pipeline

at scale, full control / cost

web-vitals → sendBeacon → your data lake (or pipe to GA4, Cloudflare Web Analytics, Elastic APM). What a big org like the platform often does: own the pipeline, no per-event vendor cost, slice by any dimension.

One-liner

“CrUX is free but aggregated and lagging — fine for ranking, useless for debugging one session. For that I want session-level RUM: a perf-native vendor like SpeedCurve, or our existing APM (Datadog/Sentry), or at our scale a web-vitals→beacon pipeline we own.”

5The Lead move: systemic, not heroic

The toolkit grades root cause + systemic fix. So the answer isn't “I opened DevTools once” — it's a process and a guardrail:

Find the class of problem, not the instance — “our INP regressions are almost always a new third-party tag's long tasks,” so you fix the category (a tag-loading policy), not one page.
Instrument RUM on p75 by route + segment so you see regressions in the field, with alerting.
Guardrail in CI — perf budget + Lighthouse gate (Lesson 03) so the same regression can't recur.
Make the data shared — a dashboard the whole org reads, so performance is a team discipline.

Full loop

Concept: diagnose with RUM→lab→localize→fix→verify. Trade-off: deep lab profiling is time-expensive, so I let RUM prioritize what's worth profiling rather than chase every Lighthouse nit. Anchor: “A listing page regressed LCP; RUM pointed at p75 mobile in SEA, the flame chart showed a render-blocking experiment script — we deferred it and added a budget so it couldn't recur.” Impact: root-causing a class prevents a fleet of future regressions. Invite: “If we lacked RUM I'd start by instrumenting it — guessing from my laptop is how teams waste a sprint.”

6Check yourself — scenario quiz

Pick an answer; instant feedback. Push-back style, like the round.

1. A PM says “the site feels slow.” What's your first move?

2. In the Performance flame chart, what does a red triangle on a task mean?

3. LCP is 3.8s. The LCP breakdown shows resource load delay dominates — the hero image starts downloading very late. Best fix?

They want you to read the sub-part and map it to a lever.

4. Your Lighthouse run on a fast machine is green, but you suspect real users are slower. What's the gap, and the fix?

5. INP is poor. Recording an interaction shows the handler waits ~200ms to even start. Which phase and where do you look?

6. Best Lead framing of “how do you diagnose a slow page?”

7. You're asked to instrument field data for Core Web Vitals on the listing page. What do you do?

scn: a junior offers to write a quick PerformanceObserver for each metric.

8. “We already pull CrUX from PageSpeed Insights — why pay for a RUM vendor?” What's your answer?

scn: a PM is pushing back on a SpeedCurve/Datadog spend.

0 / 8 answered

Try this aloud before next session: “A teammate says our hotel listing page is slow. Walk me through exactly how you'd find the root cause — first move to last — and what you'd put in place so it doesn't regress again.” Time to 90 seconds.

Good follow-up topics:

“Quiz me out loud, harder” “Walk a real flame chart with me” “How do I set CPU/network throttling?” “Layout thrashing / forced reflow?” “What RUM vendors / how to roll my own?” “web-vitals attribution build — show me?”

Lesson 04 of Interview prep. Reference card: cheatsheet/0004-diagnosing-a-slow-page-cheatsheet.html · Builds on Lessons 01–03 · Next: 05 — Bundling & code-splitting.

web.dev — Optimize LCP — break LCP into its four sub-parts and fix the dominant one.

GoogleChrome/web-vitals — the official JS library for measuring Core Web Vitals in the field.

web.dev — Measure Web Vitals in JavaScript — wiring up real-user (RUM) measurement in code.

Written by Vikas Kumar Yadav · Tech Lead · thejsdeveloper.com