> I fully expect AMD to release 128-core CPUs once they're on that process. A qu...

jlokier · on July 30, 2020

Even if you have 1024 separate processes not sharing much of anything, there are still locks in the kernel running them.

For example, a pair of threads (inside one of those 1024 processes) synchronising with each other will often go through the kernel to do so. In Linux this uses the futex syscall; Windows etc have similar. If to do that the kernel takes a lock that is shared with other processes, even if just for a moment, even if it's hashed on address and memory space, even if it's a spinlock and there's little contention, that lock causes memory traffic between multiple cores and separate processes.

Same for processes that are reading the same files as other processes, or (for example) running in the same directory when doing path lookups. There's a lot of work done in Linux to keep this scalable (RCU), but it's easy to hit scaling barriers that nobody has tested or designed for yet. Once 1024 core CPUs are common, of course the kernel will be optimised for that.

scottlamb · on July 30, 2020

Yes, I included the kernel in my list of things that are desirable to scale well for that reason.

That said, in some cases I don't think it's strictly necessary for even the kernel to scale well as long as you have a hypervisor that does. It's not unusual to deploy software in VMs on a cluster. Having more, smaller VMs per machine is a way to handle poor kernel scalability, just as I suggested for the web application server. VMs are higher-overhead than multiple containers on a single kernel, so this wouldn't be my first choice, but many people use VMs anyway.