Not everything emits wide events. Maybe you can get the entire application layer...

lelandbatey · on Feb 28, 2024

Wide events seem to be "structured logs with focused schemas" (maybe also published in a special way beyond writing to stdout) but most places I've worked would call that "logging" not "wide events".

The reasons we don't use them for everything are as others in the thread say: it's expensive. Metrics (just the numbers, nothing else) can be compressed and aggregated extremely efficiently, hence cheaply. Logs are more expensive due to their arbitrary contents.

It's all due to expense really.

isburmistrov · on Feb 28, 2024

Columnar storage stores data very efficiently, too - because it compresses data of a similar nature (columns). Check e.g. ClickHouse on this matter: https://clickhouse.com/docs/en/about-us/distinctive-features, https://clickhouse.com/blog/working-with-time-series-data-an...

So I wouldn't say that events are "expensive" while metrics are "cheap" - both depend on the actual implementation, and events can be cheap too.

And so of course if you have to optimise things, you would need to drop some information you pass to the events, but you would need to do the same for metrics (reduce the number of metrics emitted, reduce the prometheus labels,...).

adql · on Feb 28, 2024

If you have small pre-defined sets of events in data structures that compress well. That is not the case for any real system.

> And so of course if you have to optimise things, you would need to drop some information you pass to the events, but you would need to do the same for metrics (reduce the number of metrics emitted, reduce the prometheus labels,...).

Those are entirely different orders of magnitude both when it comes to size and how much usefulness you lose. In modern storage backends like Victoriametrics a counter gonna cost you around byte per metric per probe. And as you emit them periodically, that is essentially independent of incoming traffic

Capturing the requests into event/trace/whatever other name they gave to logs this month is many times that and is multiplied by traffic.

isburmistrov · on Feb 28, 2024

> Those are entirely different orders of magnitude both when it comes to size and how much usefulness you lose. In modern storage backends like Victoriametrics a counter gonna cost you around byte per metric per probe. And as you emit them periodically, that is essentially independent of incoming traffic

I thought this argument was about whether wide events can be used for metrics or metrics is a completely different concept. If we want to emulate metrics in events, we would also make them periodically independently of the traffic. Like emit them once in a while. Pretty much like Prometheus scraping works

hagen1778 · on Feb 29, 2024

Storing telemetry efficiently is only part of what Monitoring is supposed to do. The other part is querying: ad-hoc queries, dashboards, alerting queries executed each 15s or so. For querying to work fast, there has to be an efficient index or multiple indexes depending on the query. Since you referred ClickHouse as efficient columnar storage, please see what makes it different from a time series database - https://altinity.com/wp-content/uploads/2021/11/How-ClickHou...

isburmistrov · on Feb 29, 2024

And yet people use ClickHouse quite effectively for this very problem, see the comment here: https://news.ycombinator.com/item?id=39549218

There are also time-series databases out there that are OK with high cardinality: https://questdb.io/blog/2021/06/16/high-cardinality-time-ser...

hagen1778 · on Feb 29, 2024

> And yet people use ClickHouse quite effectively for this very problem

There is no doubt that ClickHouse is a super-fast database. No one stops you from using it for this very problem. My point is that specialized time series databases will outperform ClickHouse.

> There are also time-series databases out there that are OK with high cardinality

So does this blog say that tolerance to cardinality means that QuestDB indexes only one of the columns in the data generated by this benchmark?

TSDBs like Prometheus, VictoriaMetrics or InfluxDB will perform filtering by any of the labels with equal speed, because this is how their index works. Their users don't need to think about the schema or about which column should be present in the filter.

But in ClickHouse and, apparently, in QuestDB, you need to specify a column or list of columns for indexing (the fewer columns, the better). If the user's query doesn't contain the indexed column in the filter - the query performance will be poor (full scan).

See like this happened in another benchmarketing blogpost from QuestDB - https://telegra.ph/No-QuestDB-is-not-Faster-than-ClickHouse-...

isburmistrov · on Feb 29, 2024

I agree that specialised DBs outperform a general-purpose OLAP database. The question is - what does outperform mean. In this area queries should not be actually ultra-fast, they should be reasonably fast to be comfortable. And so missing indexes for some attributes would be likely okay. Looking at https://clickhouse.com/blog/storing-log-data-in-clickhouse-f..., they added just bloom filters for columns. Which makes sense, but this is not a full-blown index, and likely it will yield reasonable results. But this all is theoretical, I haven't built such a solution by self (we're working on it now for in-house observability), so likely miss something that can only be discovered on practice.

Btw we use Victoria Metrics now at work. It works good, queries are fast. But we're forced to always think about cardinality, otherwise either performance or cost get hurt. This is okay for the predefined set of metrics & labels and works well, but it doesn't allow having deep explorations.

nhourcard · on Feb 29, 2024

In QuestDB, only SYMBOL columns can be indexed. However, sometimes, queries can run faster without indexes. This is because, under the hood, QuestDB runs very close to the hardware and only lifts relevant time partitions and columns for a given query. Therefore table scans between given timestamps are then very efficient. This can be faster than using indexes when the scan is performed with SIMD and other hardware-friendly optimizations.

When cardinality is very high, indexes make more sense.

flaminHotSpeedo · on Feb 28, 2024

The whole point of wide events is recording an arbitrary set of key value pairs. How do you propose storing that in a columnar datastore?

phillipcarter · on Feb 28, 2024

I can't speak for others, but at Honeycomb that's what we do. There's some details in this blog post that might be interesting: https://www.honeycomb.io/blog/why-observability-requires-dis...