Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ddply can be very slow. Strongly recommended to get your data into the form you want in something map-reduce-y _then_ throw it into R for analysis and graphing.


I never cease to be amazed at how people who work with data large enough to make tools like plyr slow assume that everyone works with data like that.

I have been using R daily for 7-8 years now and have only occasionally turned to somethig like data.table for performance reasons. "Big data" receives waaay more attention and hype than there are actual human beings working on data of that scale. I can assure you that for the vast majority of R users world wide plyr is plenty fast enough for their needs.


Sounds like you may have some experience that could assist me. I have six 3.6GB CSV files that I can't even get into R (reading them in just never happens, the program freezes), much less manipulate the data. I've not done any mad-reduce type functions before - is there a tool I can use to leverage that and get the data into R?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: