Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> And yet people use ClickHouse quite effectively for this very problem

There is no doubt that ClickHouse is a super-fast database. No one stops you from using it for this very problem. My point is that specialized time series databases will outperform ClickHouse.

> There are also time-series databases out there that are OK with high cardinality

So does this blog say that tolerance to cardinality means that QuestDB indexes only one of the columns in the data generated by this benchmark?

TSDBs like Prometheus, VictoriaMetrics or InfluxDB will perform filtering by any of the labels with equal speed, because this is how their index works. Their users don't need to think about the schema or about which column should be present in the filter.

But in ClickHouse and, apparently, in QuestDB, you need to specify a column or list of columns for indexing (the fewer columns, the better). If the user's query doesn't contain the indexed column in the filter - the query performance will be poor (full scan).

See like this happened in another benchmarketing blogpost from QuestDB - https://telegra.ph/No-QuestDB-is-not-Faster-than-ClickHouse-...



I agree that specialised DBs outperform a general-purpose OLAP database. The question is - what does outperform mean. In this area queries should not be actually ultra-fast, they should be reasonably fast to be comfortable. And so missing indexes for some attributes would be likely okay. Looking at https://clickhouse.com/blog/storing-log-data-in-clickhouse-f..., they added just bloom filters for columns. Which makes sense, but this is not a full-blown index, and likely it will yield reasonable results. But this all is theoretical, I haven't built such a solution by self (we're working on it now for in-house observability), so likely miss something that can only be discovered on practice.

Btw we use Victoria Metrics now at work. It works good, queries are fast. But we're forced to always think about cardinality, otherwise either performance or cost get hurt. This is okay for the predefined set of metrics & labels and works well, but it doesn't allow having deep explorations.


In QuestDB, only SYMBOL columns can be indexed. However, sometimes, queries can run faster without indexes. This is because, under the hood, QuestDB runs very close to the hardware and only lifts relevant time partitions and columns for a given query. Therefore table scans between given timestamps are then very efficient. This can be faster than using indexes when the scan is performed with SIMD and other hardware-friendly optimizations.

When cardinality is very high, indexes make more sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: