Prometheus uses a file per timeseries, with two levels of delta encoding to keep...

jrv · on June 30, 2015

Also, Prometheus solves the batching problem that the article mentions by keeping chunks of recent data for each time series in RAM and only appending those chunks to their series files (of which there can be millions) when they are complete (and in as big groups as possible). Periodically, and when shutting down Prometheus, the RAM-buffered time series heads are written out as a linear checkpoint to disk and recovered upon startup.

You might think this could be better solved by a simple write-ahead-log (WAL), but the problem is that samples can come in at completely irregular intervals, and we still need to append data from the WAL to the appropriate time series files once chunks for a specific series are full. That would create frequent holes of already persisted data in the WAL, and it would have to be regularly compacted/rewritten. Which is in principle equivalent to the regular checkpointing approach again.

EDIT: There's some more information about Prometheus's storage here: http://prometheus.io/docs/operating/storage/ - but we really should write a white paper or blog post about it soon.

nwmcsween · on June 30, 2015

I don't understand why Prometheus uses leveldb to begin with though, why do you need it vs using a const offset of an interval? From reading it seems it's used for an index, I'm guessing because intervals can be variable?

jrv · on June 30, 2015

LevelDB is only used for indexes to look up series files by sets of dimensions, not for time-based lookups. We just need to find the right time series that are relevant for your query.

As a simple example, we have one LevelDB index which has single label=value pairs as the keys and as the LevelDB value, the identifiers of the time series which have those label=value dimensions. If you now query for e.g. all time series with labels foo="biz" AND bar="baz", we will do two lookups in that index: one for the key foo="biz", and one for bar="baz". We will now have two sets of time series identifiers which we intersect (AND-style matching) to arrive at the set of time series you're interested in querying. Only then do we actually start loading any actual time series data (not from LevelDB this time).