It seems like the main pitch here is auto-inclusion and auto-exclusion of various tools via an orchestration agent (which may or may not be the main model itself? Unclear from their post)
Mostly this seems like an end-run around tool calling scalability limits. Model performance degrades heavily if the field of possible tools gets too large, so you insert a component into the system that figures out what tools should be in-scope, and make only those available, to get reliability higher.
In terms of "why outsource this" it seems like the idea is that their orchestration agent would be better than a cruder task state machine that you would implement yourself. Time will tell if this assertion is true!
> auto-inclusion and auto-exclusion of various tools via an orchestration agent
Where do you see that? That would be neat, I'm under the impression orchestration is manual though – you define an agent and give it the ability to hand off tasks to sub-agents.
Sorry, maybe I could've phrased it better: it basically forces the devs to divide their tools into buckets of fewer tools manually. (The Travel Agent has N tools, the Research Agent has M tools, etc. all specified by the dev)
The pitch is that if you do this bucketization, the overall orchestrator can intelligently pick the bucket to use, but the idea is that at any moment the LLM is only exposed to a limited set of tools.
As opposed to the more pie-in-the-sky idea that given N tools (where N is very very large) the LLM can still accurately tool-select without any developer intervention. This seems pretty far off at this point.
Mostly this seems like an end-run around tool calling scalability limits. Model performance degrades heavily if the field of possible tools gets too large, so you insert a component into the system that figures out what tools should be in-scope, and make only those available, to get reliability higher.
In terms of "why outsource this" it seems like the idea is that their orchestration agent would be better than a cruder task state machine that you would implement yourself. Time will tell if this assertion is true!