Last I did this, when the processes were fork()‘s of the parent (the typical way this was done), memory overhead was minimal compared to threads.
A couple %. That was somewhat workload dependent however, if there is a lot of memory churn or data marshaling/unmarshalling happening as part of the workload, they’ll quickly diverge and you’ll burn a ton of CPU doing so.
Typical ways around that include mmap’ng things or various types of shared memory IPC, but that is a lot of work.