Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would expect "data science" is doing some form of numerical analysis. Otherwise it's just record keeping... with computers.


The hardest part of what I did was getting enough documentation to understand the data. Sometimes we got fixed width text files, with no in formation about column definitions. Or column names. Or what values in descriptive columns meant. Stuff like "class of trade".

But generally you're right. It was just simple calculations using sales records. But lots of records, at least several gigabytes, and sometimes several hundred gigabytes.


Record keeping is 90% of data projects.

The second 90% is basic math at high speeds.


Right, record keeping. But when it's not your data, things get complicated. Imagine trying to understand how another firm's data systems work. You can talk with managers, who know how the business uses data. But they have no clue how the data are stored or managed. And you can talk with IT people, who know how data are stored or managed. But they have no clue how the data are used.

And yes, speed. Aggregating hundreds of gigabytes was nontrivial to do quickly. I started with Access, and then learned to manage and use SQL Server. And eventually a multi-Xeon server with lots of RAM and SAS-attached storage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: