> To be fair, I allocated the stacks manually in one large allocation; otherwise...

Scramblejams · on Sept 17, 2015

Hope comex replies to your question. Typical green thread usage is spawn-em-as-you-need-em, so if in order to spawn lots of 1:1 threads I need to do it all up front, that could be very limiting or complicating.

comex · on Sept 17, 2015

Yeah, I just made a mistake - you can increase the maximum number of mappings using /proc/sys/vm/max_map_count; I tried doing that and switching back to normal stack allocation (but still specifying the minimum size of 16KB using pthread_attr_setstacksize) and it doesn't change the number of threads I was able to create.

...in fact, neither did removing the setstacksize call and having default 8MB stacks. I guess this makes sense: of course the extra VM space reserved for the stacks doesn't require actual RAM to back it; there is some page table overhead, but I guess it's not enough to make a significant difference at this number of allocations. Of course, on 32-bit architectures this would quickly exhaust the address space.

If increasing max_map_count hadn't worked, it would still be possible to allocate stacks on the fly - but you would get a bunch of them in one mmap() call, and therefore in one VM mapping, and dole them out in userland. However, in this case guard pages wouldn't separate different threads' stacks, you would have to generate code that manually checks the stack pointer to avoid security issues from stack overflows, rather than relying on crashing by hitting the guard page. Rust actually already does this, mostly unnecessarily; I'm not sure what Go is doing these days but I think it does too. Anyway, given that the above result I suspect this won't be an actual issue, at least until the number of threads goes up by an order of magnitude or something.

Scramblejams · on Sept 17, 2015

Thanks for your reply. Wonder how other platforms fare.

acconsta · on Sept 17, 2015

That doesn't seem that unrealistic — you could allocate your stacks using slab allocation, for example. I wonder why the Kernel allocator doesn't do a better job though.