I don't feel like it tonight. Drinking whiskey and trying not to think about how effed up the Israel footage and the North Korea situation are. Maybe that's why it's not the best post.
I don't mean to crap on the work presented. Great article, good summary, and the tech is solid. It's just older than what excites me; a lot of progress has been made in 20 years and the majority of it hasn't found commercial applications.
But here is a citation root for a lot of amazing work in this space:
My absolute hero Edward Kmett gave a talk stitching a lot of this work together a long time ago: https://www.youtube.com/watch?v=uA0Z7_4J7u8 . I have no idea if he's pursued it, it's just one of his many talks that left me with an incredibly lasting impression.
Variants of this technique work for arbitrary documents and structures, work better at very high volume, have cache oblivious properties, and support transactions. Universal indexes that are reasonably good for all queries (as opposed to specfiic queries) are also possible. Coupled with discrimination-based techniques for O(n) table joins, there's probably a whole startup around there.
If you're optimal for a cache without knowing how big the cache is, you're optimal for all caches. It's tough to do, but it happens.
This property probably doesn't get the respect it deserves in this super weird world where you can't really say how many caches are between you and your data.
Really good read, thanks. My brain couldn't quite grok the idea of a "cache-oblivious" data structure, but the article suggests they'd be more aptly described as "cache size-oblivious".