Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting idea. Columnar ETL can be quite efficient in some scenarios because frequently an ETL transformation (e.g. calculating a new column) effectively modifies an existing table, rather than creates a new one. This allows calculating only the delta, instead of re-building a new table from. This helps optimize performance and do calculations in-memory without slow disk I/O.

Another advantage is that it allows performing many transformations (e.g. filtering) directly on dictionary compressed data, without decompressing it. This works well in Vertica [1] (based on C-Store DB [2]) which was our inspiration for building a light-weight ETL for business users that also uses a columnar in-memory data transformation engine [3].

[1] https://www.vertica.com/

[2] http://db.csail.mit.edu/projects/cstore/

[3] http://easymorph.com/in-memory-engine.html



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: