A big problem for this would be the transfer of data to and from the API. Imagin...

msingleton · on Feb 19, 2010

This is true. A possible solution is if the service was run in EC2, you could leverage the speed on the amazon network if the data was already stores in S3.

pierrefar · on Feb 19, 2010

I was thinking that as I was typing my comment. Another solution, which S3 uses, is to ship hard disks by courrier. I guess the real metric here is cost per GB transfered in a unit time, say $/GB-hr.

a-priori · on Feb 19, 2010

At what point does it become ridiculous to move the data, which may be measured in TB or PB, when the algorithm itself would be measured in KB or MB?

ssp · on Feb 19, 2010

Hush. Not in front of the VCs.

akronim · on Feb 20, 2010

In clusters working on large amounts of in memory data, the approach is often to load the data, then move the code (e.g. a java class implementing some data procesing interface) to the data as required, rather than move the data to the code.

robryan · on Feb 20, 2010

There is always the stuff that goes the other way though like how Seti@Home does FFT's which is computationally expensive and benefits from a distributed system but the file size is quiet small.

a-priori · on Feb 20, 2010

Yes, BOINC projects are cases where it is not ridiculous to move the data, because it is computation power that is the scare resource and the work units are typically only in the hundreds of kilobytes to single-digit megabytes.

Estragon · on Feb 19, 2010

A local client collects summary statistics to send to The Algorithm over the net, kind of like how google's mobile voice search works.