Why does every TSD seem so overly engineered and for all the wrong reasons? Why ...

mstepniowski · on June 30, 2015

Are you one of the guys that implemented a Twitter clone during the weekend?

zaphar · on June 30, 2015

    Why not just use a time decaying ring buffer

Because you don't want to through the data away. RRDTool and Whisper implement ring buffers but you lose data and resolution with them. If that's acceptable then absolutely use those tools. If you don't want to lose data though then you need something else.

nwmcsween · on June 30, 2015

You don't have to decay, I just assumed eventually you would want to decay the data but in a separate ring buffer, thus no resolution loss.

bigger_cheese · on July 1, 2015

This is how things work at the industrial plant where I work. We dump a heap of data out of PLC's at basically whatever the native frequency the instrument can log it hits the first datastore (which is essentially a flushed ring buffer) where it might get thrown at a HMI display or something like that after that it decays into the slower historical archive where it can get some metadata added to it and get rationalized aggregated, batched together or whatever.

It varies but datastores tend to be 1 hour worth of data, then 3 days, then 3 months, then permanent. As you move between datastores latency to access a timestamp increases.

We tend to call Time Series databases "Data Historians" its a big industry and from what I can tell most commercial products are built around ring buffers.

tinco · on June 30, 2015

What you are describing is exactly what Carbon does (The datastore of Graphite). It's a few hundred lines of not-exactly-production-quality python.

But if you'd actually RTFA, you'd know that Whisper is hardly the end-all of TSD, that cache to batch is not so trivial as you make it seem.

I do believe there's sort of optimal solution that isn't very complex and not far away from many solutions out there. But most definitely more complex than the TSD's in existance today.

You say over-engineered, but if you'd take a look at what's out there, they're all super simple. Usually being a simple layer over some pre-existing storage engine.

I'd be surprised if in any of the currently popular TSD's there more than a month or two of fulltime work put in just the storage engine. Even though there are so many, no one has time to make them as optimized as say popular SQL databases are.

nwmcsween · on July 1, 2015

You could 'decay' the buffer into another by anything, stronger compression, time. Over-engineered in the sense that it's layers of tools that don't need to be there.

jwatte · on July 1, 2015

The trade off is decaying data. We store and query 300,000 counters with 10 days of 10 second resolution and successive buckets out to six years, on a single bare metal server (+ backup replica) using GitHub.com/imvu-open/istatd but this is not right for everyone.