Why does every TSD seem so overly engineered and for all the wrong reasons? Why not just use a time decaying ring buffer (multiple buffers could be used), one statistic one file (or more depending on the decay) and offset by a set interval if you have irregular intervals 'smooth' it to fit O(1) for most things. My other issue is (from a glance) looking at some TSD they ignore most research done on how to effectively store the data.
Too many writes, use a cache to batch.
No metadata? put the metadata in a different file and point to the offsets.
Because you don't want to through the data away. RRDTool and Whisper implement ring buffers but you lose data and resolution with them. If that's acceptable then absolutely use those tools. If you don't want to lose data though then you need something else.
This is how things work at the industrial plant where I work. We dump a heap of data out of PLC's at basically whatever the native frequency the instrument can log it hits the first datastore (which is essentially a flushed ring buffer) where it might get thrown at a HMI display or something like that after that it decays into the slower historical archive where it can get some metadata added to it and get rationalized aggregated, batched together or whatever.
It varies but datastores tend to be 1 hour worth of data, then 3 days, then 3 months, then permanent. As you move between datastores latency to access a timestamp increases.
We tend to call Time Series databases "Data Historians" its a big industry and from what I can tell most commercial products are built around ring buffers.
What you are describing is exactly what Carbon does (The datastore of Graphite). It's a few hundred lines of not-exactly-production-quality python.
But if you'd actually RTFA, you'd know that Whisper is hardly the end-all of TSD, that cache to batch is not so trivial as you make it seem.
I do believe there's sort of optimal solution that isn't very complex and not far away from many solutions out there. But most definitely more complex than the TSD's in existance today.
You say over-engineered, but if you'd take a look at what's out there, they're all super simple. Usually being a simple layer over some pre-existing storage engine.
I'd be surprised if in any of the currently popular TSD's there more than a month or two of fulltime work put in just the storage engine. Even though there are so many, no one has time to make them as optimized as say popular SQL databases are.
You could 'decay' the buffer into another by anything, stronger compression, time. Over-engineered in the sense that it's layers of tools that don't need to be there.
The trade off is decaying data.
We store and query 300,000 counters with 10 days of 10 second resolution and successive buckets out to six years, on a single bare metal server (+ backup replica) using GitHub.com/imvu-open/istatd but this is not right for everyone.
Too many writes, use a cache to batch. No metadata? put the metadata in a different file and point to the offsets.