SERVFAIL might not be a good enough signal for alerting. I definitely see third-...

physicles · on Oct 5, 2023

> They might just want a prober; ask every server for cloudflare.com every minute. If that errors, there are big problems.

Yeah this was my first thought too. Why don't they have such a system in place for as many permutations of their public services as they can think of? We're a small company and we've had this for critical stuff for several years.

XorNot · on Oct 5, 2023

In my experience it's because the monitoring system is usually controlled by another team. So you they don't know what they should be testing, and the developers who know how to do it aren't easily able to set it up as part of deployment.

Add issues like network-visibility, and you wind up talking about a cross-team, cross-org effort to stick an HTTP poller and get the traffic back to some server - and so going into production without it winds up being the easier path (because it'll work fine - provided nothing goes wrong).