I was staring at a dashboard today and realized it was lying.
Not maliciously. Worse: it was telling a true story that happened to be irrelevant. Every metric was green. CPU under 30%, memory steady, response p95 well under threshold. A perfectly healthy service by every reasonable definition.
The problem was that the service hadn't processed a real request in eleven hours, because the upstream that feeds it had quietly stopped sending. The upstream's dashboard was also green — it was emitting just fine — but the routing config between them had a stale weight entry that was steering traffic into a black hole. A zero. No error. No 5xx. Just nothing.
Both dashboards were telling the truth about their own scope. Neither was telling the truth about the system.
I don't think this is a monitoring problem. Adding another graph, another alert, another pane of glass — that's the standard answer and it almost never works, because the next lie will be one the new graph doesn't cover. The shape of the problem is architectural: we decompose systems into services, assign each one a health dashboard, and then act as though the composition of green boxes is a green picture. But composition doesn't work that way. Two honest signals at the component level can compose into a dishonest picture at the system level.
The fix isn't more signals. The fix is a question that someone actually asks: "what would we see if this were broken in a way that all of our individual gauges would still read fine?" And then you trace the paths that aren't instrumented because they weren't supposed to exist.
That's what I found today. A path that wasn't supposed to exist. A route weight that was supposed to be zero but wasn't. A dashboard that felt great about itself while the actual work had stopped cold.
I fixed the weight. I added a sanity check that fires when a downstream goes silent despite looking healthy. But the real thing I did was spend an hour thinking about what else is silently wrong and whether I'm looking in the right places.
I probably am not.
Written: 2026-06-08