Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Fast Lightweight Time-Series Store for IoT Data (arxiv.org)
97 points by rch on May 8, 2016 | hide | past | favorite | 30 comments


I was a little disappointed to see that this was just a single-machine hierarchical time index on top of a pre-sorted append log.

It's not a bad architecture, and the lockless queuing system and shared memory is cute, but:

- It only supports time indexing. Querying by other fields (e.g. if you wanted to build a time histogram of when you saw a particular event) requires reading the entire dataset.

- It doesn't address replication/distributed indexing, which seems like a must.

- The use of calendar time over micro-timestamps also needlessly complicates things, though presumably it makes their queries more efficient (assuming people query by discreet time buckets like "yesterday").


with the risk of sounding cynical here, but it feels this is a solution from people that aren't familiar with what the problem is about, ending up in optimizing the wrong areas.

Few observations here:

* using udp means you don't care if you lose data and that's bad. if your buffers are full your data will be silently dropped, aside from network glitches and metric sizes that can easily exceed 512 bytes. Your buffers can get full when you wait on disk io etc..

Optimizing your network stack when your bottleneck is disk io doesn't buy you much. But the real issue that they haven't tackled is data distribution. If you can horizontally scale your data optimally, then when you hit a bottleneck (whatever it is) you can always increase your cluster size and solve your bottlenecks. In the current solution, If the machine dies, you will lose tons of data.

cow pipelining exist for a very long time now. There's nothing new in what they are introducing.

The persistent data they hold in memory is not guaranteed to be recovered if they crash before an fsync/msync.

Secondary index on time column is not enough. Queries often need particular keys only and you'll end up doing full table scans.


I think udp vs tcp is debatable depending on the kind of data. If you are collecting continuous telemetry, losing a few may not be end of the world. Of course, YMMD so you should pick a database based on your objective.

> Optimizing your network stack when your bottleneck is disk io doesn't buy you much.

Agree but also the famous "depends." If you are collecting A LOT of data over the network, first makes sure your network card is capable of handling the incoming traffic. An anology would be a 100M vs 1G port.

Also, if you are running on embedded system device like RPie or Arundio, I think disk will definitely be the first bottleneck before you even have a chance to respond to your network saturation.


I'm surprised Cassandra isn't mentioned. It's not specifically made for time series and there are some tricks to make it work nicely (bucketing) but it's a solid solution.


kairosDb which offers a time series data store uses cassandra. I had a very cursory evaluation of kairos and it was a while back. Don't remember the particulars though.


Cassandra is not a solid solution.

After problems with a node, often the management tools cause additional problems.

I suggest anybody follow the cassandra-users list for a while before committing to it.


Are you saying Cassandra is not suitable for time series data or that it has issues in general as a database engine?


Incidentally I was reading an old HN post today on Kdb+ for timeseries where people who use that commercial db said it's amazingly fast (I only played around with it, nothing speedy). People in that discussion mentioned that 300k writes/s mentioned about Kdb+ is nothing compared to the millions of writes/s IoT solutions need. So I am wondering what kind of performance you can get on time series databases on a single machine; I understand that with clustering systems like Cassandra can do 100 million/s and more (someone mentioned billions but not what DB), but getting benchmarks just throwing hardware at the problem, to me, only makes sense if we know the performance on a single node, multicore performance single node and then the scaling graph when adding nodes.

I am just interested in this kind of thing; I used to work with time series for a German telco; that was DB2 on heavy IBM metal for massive amounts of money and financial work for a startup. I wonder what the state of the art here is.


Another comment here mentioned a recent FAST paper talking about BtrDB, which can get ~16mil writes/sec (nanosecond timestamp, 8 byte values) on a single node, with near-linear speedup when clustering: https://blog.acolyer.org/2016/05/04/btrdb-optimizing-storage...


Correct me, if I am wrong. InfluxDb is written in Golang and not Java as the table 1 mentions in the paper.


I think you are right


In the industrial control system world we call time series DBs 'Enterprise Historians' they are well optimised. Good ones support storing data at native frequency of control system including sub-second intervals. Basically whatever frequency the instrument can sample at these things can record.

They also have good compression and buffering support, including catchup functionality.

They are by no means lightweight - usually they rely on PLC's or SCADA system for inputs. They may not offer a complete solution for IOT but I'm sure there are lessons which can be adapted.


Does anybody know why the paper states that influxdb is implemented in Java?


Probably accidentally copied the column from opentsdb, which is right below it


Prometheus InfluxDB opentsdb ... and on and on

It seems it's pretty crowded


Just submitted this: https://news.ycombinator.com/item?id=11654402

> It turns out you can accomplish quite a lot with 4,709 lines of Go code! How about a full time-series database implementation, robust enough to be run in production for a year where it stored 2.1 trillion data points, and supporting 119M queries per second (53M inserts per second) in a four-node cluster? Statistical queries over the data complete in 100-250 ms while summarizing up to 4 billion points. It’s pretty space-efficient too, with a 2.9x compression ratio. At the heart of these impressive results, is a data structure supporting a novel abstraction for time-series data: a time partitioning, copy-on-write, version-annotated, k-ary tree.


It's not at all crowded since not all of them actually work well and they are optimized for different use cases and different setups. One use case I believe is not covered well, and I use it often, and evaluated it, is actually storing the complete time series and reading it as a whole, without any aggregation. This is important for different signal processing and so on, where sampling does not cut it. At the and I chosen KairosDB (with Cassandra backend). Not that it is perfect, but it if CRUD is most important, and you offload processing to other engines, it works well.

That is why I am always looking forward to new things such as this or BTrDB. There is still a lot of room for improvement everywhere.


It's crowded because there isn't a clear leader.


Speaking as a Prometheus developer, each is suited to different things.

For example if you want to store timeseries data long term and already use HBase then OpenTSDB is a good choice.

On the other hand if you want to do monitoring that's simple and dependable in an emergency with querying, graphing and alerting over short/medium-term data, then Prometheus would a good choice.


> short/medium-term data

This in particular, but your entire post, would be a great addition to the Prometheus front page, or top of the documentation section. This wasn't clear to me initially, speaking as someone who evaluated prometheus a few months ago.


We plan on seamlessly integrating with long-term storage (https://prometheus.io/docs/introduction/roadmap/#long-term-s...) and OpenTSDB is one option for that for us.

Ignoring that as it's planned work, the typical considerations are more around availability vs. consistency and that we're a metrics system focused on operations rather than a event store. Most of that's already covered in our docs. See https://prometheus.io/docs/introduction/overview/#when-does-... and https://prometheus.io/docs/introduction/faq/


I wouldn't include opentsdb under "lightweight" as it requires HBase.


Last time I checked opentsdb only supported millisecond resolution timestamps. That's not enough precision for scientific data. I'm using influxdb because of the nanosecond timestamps, though microseconds would be enough.


The primary dataset used for testing is "USGS archived earthquake data". Does anyone have a link to the actual dataset?


These need the traditional YA prefix. Like YACC, YATSD or YATSS?


druid??


It isn't a small footprint, but Druid is the best of breed yet most often underrated tool in this space. Influx gets the press because of the simplicity in setting it up, but epic failed at clustering by trying to invent their own multi-raft magic. If you're not analyzing a lot of data Druid doesn't make sense to setup but if you do, it is simply incredible. I can do analytics on 5+ dimensions with 10 billion datapoints quick enough to make it very responsive in grafana. Their most recent release looks quite interesting as well.

https://groups.google.com/forum/m/#!topic/druid-user/nqqb5RI...


The document (pdf download) specifically mentions Druid in a comparison table.


Citation for the epic fail?


They re-did it a few times from scratch, then decided it was hard, and made it a commercial only feature. It still isn't stable. Also, even with their previous clustering, each influx node had to have the entire dataset, so you're limited by the size of data that fits on a single node.

Druid was built from the ground up to be distributed. It means there is some work to set it up, but once you do, it scales horizontally very easily. Bonus points if you run it on something like mesos making it quite easy to deal with (we do).

https://influxdata.com/blog/update-on-influxdb-clustering-hi...

https://influxdata.com/blog/influxdb-clustering-design-neith...

http://www.refactorium.com/distributed_systems/InfluxDB-and-...

etc. They redid their clustering a few times and the 3rd time around made it closed source. Note that $employer also sues influx for data that we are ok with losing ie: metrics. If the data has a low amount of dimensions, influx is faster than druids. However, if you have 3+ dimensions, druid spanks the pants off of influx by nature of distributing the computation better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: