Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sort will actually externally sort blocks into temp files and merge them. Adjusting this block size can help with thrashing.

Awk may still be better for uniquification.



What about 'sort -u' ?


I'm not sure. I haven't closely studied the difference between each algorithm. My guess would be that sort -u would perform better as the data set gets larger with a good block size setting because it does do an external sort. Cardinality would also affect the performance. If the unique set handily fits in memory, an external sort on a large data set wouldn't be very efficient.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: