> So if you can handle every incoming request with one worker that worker gets them all. Otherwise each worker pops off the stack as it becomes free
As I alluded, that works great if "can handle" and "being free" are clear-cut binary properties. But for complex applications when you are driving for high utilization while keeping latency down those questions become complicated; a worker might have some free capacity to handle requests but it doesn't mean that it would produce response as quickly as some other (more idle) worker.
In other words, the problem is not just assigning requests to workers that can handle them but assigning requests to workers that can handle them with lowest latency.
As I alluded, that works great if "can handle" and "being free" are clear-cut binary properties. But for complex applications when you are driving for high utilization while keeping latency down those questions become complicated; a worker might have some free capacity to handle requests but it doesn't mean that it would produce response as quickly as some other (more idle) worker.
In other words, the problem is not just assigning requests to workers that can handle them but assigning requests to workers that can handle them with lowest latency.