This was a bad post and shouldn't be preserved. And I can still edit it. So I di...

truncate · on May 16, 2018

>> LSM-trees are great, but better stuff exists.

Sure. My knowledge is pretty much limited to what this articles talks about, so interested to know what else is out there.

willvarfar · on May 16, 2018

Fractal Trees as used in TokuDB (now owned by Percona).

The big thing in practice is that TokuDB vs InnoDB is dealing with large datasets.

However, I don't know where the very latest MyRocks stands vs TokuDB.

sanderjd · on May 16, 2018

This may be an interesting comment if it had some keywords to search or (better) links. As is, I really don't see the point of it.

KirinDave · on May 16, 2018

I don't feel like it tonight. Drinking whiskey and trying not to think about how effed up the Israel footage and the North Korea situation are. Maybe that's why it's not the best post.

I don't mean to crap on the work presented. Great article, good summary, and the tech is solid. It's just older than what excites me; a lot of progress has been made in 20 years and the majority of it hasn't found commercial applications.

But here is a citation root for a lot of amazing work in this space:

https://www.dcc.uchile.cl/~gnavarro/ps/alenex10.pdf

My absolute hero Edward Kmett gave a talk stitching a lot of this work together a long time ago: https://www.youtube.com/watch?v=uA0Z7_4J7u8 . I have no idea if he's pursued it, it's just one of his many talks that left me with an incredibly lasting impression.

Variants of this technique work for arbitrary documents and structures, work better at very high volume, have cache oblivious properties, and support transactions. Universal indexes that are reasonably good for all queries (as opposed to specfiic queries) are also possible. Coupled with discrimination-based techniques for O(n) table joins, there's probably a whole startup around there.

Sorry I can't do better right now.

ovao · on May 16, 2018

I'm curious what you mean by "cache oblivious". How can a data structure be "cache oblivious"?

KirinDave · on May 16, 2018

If you're optimal for a cache without knowing how big the cache is, you're optimal for all caches. It's tough to do, but it happens.

This property probably doesn't get the respect it deserves in this super weird world where you can't really say how many caches are between you and your data.

rusbus · on May 16, 2018

An intro here: https://rcoh.me/posts/cache-oblivious-datastructures/

ovao · on May 16, 2018

Really good read, thanks. My brain couldn't quite grok the idea of a "cache-oblivious" data structure, but the article suggests they'd be more aptly described as "cache size-oblivious".

pasbesoin · on May 16, 2018

Thank you. That youtubed presentation was quite interesting.

ddorian43 · on May 16, 2018

Are you connected somehow to the israel and north-korea situations ? (meaning more than a normal person)

sanderjd · on May 16, 2018

Great response, thanks!