This is where I usually draw the line at the "everything is a service" world view. It is much better to provide this as a library than as a service, particularly algorithms that run across large data sets.
However, the exception I see to this is a service that applies the algorithm across a large dataset that is owned by the service. An example of this is geocoding, where you probably don't want to store the addresses of everyone in the world in your database, but it is easy to reach out to a range of providers to get latitude longitude for your address.
At least if it's a library you're not really increasing your chance of failure. Whereas as you add external dependencies, the odds of one of them being down and taking your service with it increases. So you had better thinking about how you handle failure situations gracefully and so on, so the service you call had better be doing something special (e.g. giving you access to some huge dataset) to make it worth that additional complexity.
I think the other area where AAS makes sense is in distributed algorithms, there is a genuinely large amount of work in writing parallel versions of algorithms and they also require clusters.
Google Wave as a platform is a huge step in distributed computing abstraction. If you ever try/tried to implement any app on top of distributed algo you will value the wave possibilities!
As what we do is effectively a limited case of this (recommendations as a service), I'll give my two cents:
This will cause a total nerdgasm, and nobody will buy it.
There is an inverse correlation for how much abstraction is between our product and our customers' bottom-lines and how quickly they're willing to open their wallets. The proposed notion here is taking a couple steps further away from providing an end-to-end solution to a problem. Geeks love that kind of stuff. It's hard to get people to pay for it.
I can see this model work for AI problems that depend more on data / trained models than actual code or algorithm itself. It also makes sense for patented algorithms (SIFT maybe), paying per instance solved might be cheaper/easier than dealing with licensing issues.
Author is under impression that implementing algorithms is hard. Real problem is analysing and understanding the problem, implementation of the standard algorithms is the easiest part. If your problem is common enough to have its implementation as service, it's very likely there is free library anyway.
Seriously, ask Dijkstra's shortest path at IOI (high school informatics olympiad), I'd bet half of students would get it 100% correct under an hour.
A big problem for this would be the transfer of data to and from the API. Imagine an algo to analyze gigabytes or terabytes of data.
Also, protection of the data as it is being transfered, stored, and analyzed is an issue. This is both data integrity and also protection for privacy or confidentiality reasons.
This is true. A possible solution is if the service was run in EC2, you could leverage the speed on the amazon network if the data was already stores in S3.
I was thinking that as I was typing my comment. Another solution, which S3 uses, is to ship hard disks by courrier. I guess the real metric here is cost per GB transfered in a unit time, say $/GB-hr.
In clusters working on large amounts of in memory data, the approach is often to load the data, then move the code (e.g. a java class implementing some data procesing interface) to the data as required, rather than move the data to the code.
There is always the stuff that goes the other way though like how Seti@Home does FFT's which is computationally expensive and benefits from a distributed system but the file size is quiet small.
Yes, BOINC projects are cases where it is not ridiculous to move the data, because it is computation power that is the scare resource and the work units are typically only in the hundreds of kilobytes to single-digit megabytes.
This is already being done lots of places. And if you take the literal definition of the word algorithm, then all web API services are algorithms as a service. So I don't like the name, but the general idea of the post is interesting.
Thinking along these lines, something that I would find more useful would be a web service that makes a well defined managed infrastructure available to me to run my own jobs or algorithms on. For example a service I can use to submit my own map/reduce style jobs to and have it run on a big cluster of systems managed by someone else; Or a service that allows me to submit jobs to run on specialized hardware, like a cluster of systems packed with NVIDIA CUDA cards. Providers of these services could also have a library of pre-canned jobs for common tasks like text indexing, link extraction, parsing W3C logs into stats, etc. With a library like that then you've got what this post is describing and more.
I like the concept of an algorithm 'supermarket' where you get all of these algos as part of one API with various objects/methods.
But the problem is one of specialization. Surely there are other APIs out there that specialize in particular aspects of what the supermarket is providing. For example, it this AAS business is providing distance algos as well as image algos, there must be Geo Businesses and Image Businesses that can provide higher quality given their specialization.
We do this in-house in my research group already, as a way of letting anyone use fairly complex-to-setup AI systems without installing them on every machine. Even lets you use heavyweight AI from, say, Flash without doing something godawful like porting it to ActionScript, because Flash can just call out to some webservice for the algorithm it needs. Or, build a demo you can distribute to people elsewhere without having to distribute the backend as well, because the GUI phones home to get its algorithmic work done on our servers.
It's worked pretty decently, but we keep it internal because we just don't have the compute resources to offer things like SAT-solving or logic programming or planning as a service to the general public.
I once tried to build a web site with content tied to locations down to the smallest towns in all countries, and I found it hard to get everything right. So I think it would be interesting to have maps, all kinds of geographical information along with some related algorithms "for lease": e.g. find the nearest city to a given location, IP-to-location, address format for a given country for building proper entry forms, phone number formats, districts/county/states for a country, etc.
The database of localities and all associated information is not only huge, it's changing literally every day and thus should be constantly kept up to date. Definitely a good candidate for the hosted service scheme.
This would work very well for memory/computation intense algos with relatively small input and output. For example, a semantic analysis engine that accepts a URL and returns the extracted meaning from a web page. Input several bytes. Output less than a k. I think this is totally feasible and worth it as long as customers can monetize the output. To me a service like OpenCalais is AaaS. Sure they have software but the hard part of what they do is semantic analysis.
There are many interesting algorithms in academia, that never see the light of day. If such a service could motivate more people from academia to share software , this could become very useful.
This is effectively the same proposition that web services promised us ten years ago. For the most part we still develop closely-coupled software, and I don't see that changing in most cases
However, the exception I see to this is a service that applies the algorithm across a large dataset that is owned by the service. An example of this is geocoding, where you probably don't want to store the addresses of everyone in the world in your database, but it is easy to reach out to a range of providers to get latitude longitude for your address.