While I don't have an opinion on wide events (AKA spans) replacing logs, there a...

phillipcarter · on Feb 28, 2024

For point (2), you can derive accurate counts from sampled data if the sampling rate is captured as metadata on every sampled event. Some tools do support this (I work for Honeycomb, and our sampling proxy + backend work like this, can't speak for others).

The issue is there are still limits to that, though. I can still get a count of events, or a AVG(duration_ms). But if I have a custom tag I can't get accurate counts of that. And if I want to get distinct counts of values, I'm out of luck. Estimating that is an active machine learning research problem.

wiseguyeh · on Feb 28, 2024

It's an interesting point. We are actually running a test with with Honeycomb's refinery later this week, I'm slightly skeptical but curious to see if they can overcome this bias.

taion · on Feb 28, 2024

You also lose accuracy because of sampling noise.

fl0ki · on Feb 28, 2024

On top of that, metrics can have exemplars, which give you more (and dynamic) dimensions for buckets without increasing the cardinality of the metric vectors themselves. It's pretty much a wide event, with the sampling rate on this extra information just being the scrape interval you were already using anyway.

Not every library or tool supports exemplars, but they're a big part of the Prometheus & Grafana value proposition that many users entirely overlook.

taion · on Feb 28, 2024

This is exactly right. This kind of structured logging is great, but it doesn’t replace metrics. You really want to have both, and simple unsampled metrics are actively better for e.g. automated alerting for exactly those reasons. They’re complements more than substitutes.