Unreasonable according to you. :) The nice thing about virtual memory is that it...

cyphar · on Oct 10, 2018

Thirdly, the kernel will kill you if your overcommit ratio is too high. I had this argument with the Go folks several years ago (when Docker would crash after starting 1000 containers because the Go runtime had allocated 8GB of virtual memory while only a tens of MB were in use and the kernel freaked out).

You're right that it doesn't cost anything, other than the risk that a process can cripple your machine using its overcommitted memory mapping. And so the kernel has protections against this, which should deter language runtime developers from doing this.

And let's not forget that MADV_DONTNEED is both incorrectly expensive on Linux and ridiculously expensive compared to freeing memory and reallocating it when you need it. Bryan Cantrill ranted about this for a solid half an hour in a podcast a year or two ago.

masklinn · on Oct 10, 2018

So… does that mean the linux kernel will blow a gasket if I mmap actually large files to play with them but have almost no resident memory? That doesn't seem reasonable.

cyphar · on Oct 10, 2018

This is only for anonymous memory, obviously it doesn't affect file-backed memory.

amscanne · on Oct 10, 2018

What do you mean by “free” memory? Actually unmap it?

Also, I assume the crippling you’re talking about here is just the ability to rapidly apply memory pressure? Otherwise I’m very confused.

cyphar · on Oct 10, 2018

> What do you mean by “free” memory? Actually unmap it?

Sorry, I didn't phrase it well. MADV_DONTNEED is significantly more expensive than most ways that memory allocators would "free" memory. This includes just zeroing it out in userspace when necessary (so no need for a TLB modification), or simply unmapping it and remapping it when needed.

> Also, I assume the crippling you’re talking about here is just the ability to rapidly apply memory pressure?

Right, and if the memory is overcommitted then you can cause OOM very trivially because you already have more mapped pages than there is physical memory -- writing a byte in each page will cause intense memory pressure. Now, this doesn't mean that it would kernel panic the machine, it just means it would cause issues (OOM would figure out what process is the culprit fairly easily).

This is why vm.overcommit_ratio exists (which is what I was talking about when it comes to killing a machine) -- though I just figured out that not all Linux machines ship with vm.overcommit_memory=2 (which I'm pretty sure is what SUSE and maybe some other distros ship because this is definitely an issue we've had for several years...).

There's also RLIMIT_AS, which applied regardless of overcommit_memory.

amscanne · on Oct 11, 2018

Right. I’m very familiar with all these mechanisms, I guess I just don’t agree that the ability to cause an OOM, particularly if applications are isolated in cgroups appropriately, is a big deal. On balance, not allowing applications to use virtual memory for useful things (such as the Go case of future heap reservation) or underutilizing physical memory seems worse.

As an aside, it seems like an apples and oranges comparison to compare “freeing” by zeroing (which doesn’t free at all) to MADV_DONTNEED. I’m also pretty sure that munmap will be much slower than MADV_DONTNEED, or at least way less scalable, given that it needs to acquire a write lock on mmap_sem, which tends to be a bottleneck. It does seem like there’s a lot of opportunity for a better interface than MADV_DONTNEED though (e.g. something asynchronous, so you can batch the TLB flush and avoid the synchronous kernel transition).

cyphar · on Oct 11, 2018

> particularly if applications are isolated in cgroups appropriately

Once the cgroup OOM bugs get fixed, amirite? :P

> It does seem like there’s a lot of opportunity for a better interface than MADV_DONTNEED though (e.g. something asynchronous, so you can batch the TLB flush and avoid the synchronous kernel transition).

The original MADV_DONTNEED interface, as implemented on Solaris and FreeBSD and basically every other Unix-like does exactly this -- it tells the operating system that it is free to free it whenever it likes. Linux is the only modern operating system that does the "FREE THIS RIGHT NOW" interface (and it's arguably a bug or a misunderstanding of the semantics -- or it was copied from some really fruity Unix flavour).

In fact, when jemalloc was ported to Solaris it would crash because MADV_DONTNEED was incorrectly implemented on Linux (and jemalloc assumed that MADV_DONTNEED would always zero out the pages -- which is not the case outside Linux).

> As an aside, it seems like an apples and oranges comparison to compare “freeing” by zeroing (which doesn’t free at all) to MADV_DONTNEED. [...] I’m also pretty sure that munmap will be much slower than MADV_DONTNEED.

This is fair, I was sort of alluding to writing a memory allocator where you would prefer to have a memory pool rather than constantly doing MADV_DONTNEED (which is sort of what Go does -- or at least used to do). If you're using a memory pool, then zeroing out the memory on-"allocation" in userspace is probably quite a bit cheaper than MADV_DONTNEED.

But you're right that it's not really an apt comparison -- I was pointing out that there are better memory management setup than just spamming MADV_DONTNEED.

ptx · on Oct 10, 2018

Do you have a link to the podcast?

agolliver · on Oct 10, 2018

it's one of these two

http://www.bsdnow.tv/episodes/2015_08_19-ubuntu_slaughters_k...

http://www.bsdnow.tv/episodes/2015_11_23-the_cantrill_strike...

(I think the second one)

ptx · on Oct 10, 2018

Thanks!

agolliver · on Oct 10, 2018

I just had the time to find the timestamp, it starts about 57-58 minutes into that second video (The Cantrill Strikes Back)

toast0 · on Oct 10, 2018

The thing is, people want a way to measure and control the amount of memory that a process uses or is likely to use. Resident memory is one way to measure actually used memory, but from the man 3 vlimit, RLIMIT_RSS is only available on linux 2.4.x, x < 30; which nobody in their right mind is still running. So we have RLIMIT_AS which limits virtual memory, or we have the default policy of hope the OOM killer kills the right thing when you run out of ram.

That you have to keep fighting this battle is an indication that people's needs (or desires) aren't being well met.

cesarb · on Oct 10, 2018

There's a third reason: trying to allocate too much virtual memory on machines with limited physical memory will fail on Linux with the default setting vm.overcommit_memory=0. See for instance https://bugs.chromium.org/p/webm/issues/detail?id=78

sitkack · on Oct 10, 2018

And those processes with large VM usage also have a problem doing an exec (fork/execvp pair) because of address space exhaustion.

martincmartin · on Oct 10, 2018

Great points. A third reason is core files: That 1 TB of unused virtual memory will be written out to the core file, which will take forever and/or run out of disk. This is part of the problem of running with the address sanitizer: you don't get core files on crashing, because they'd be too big.

the8472 · on Oct 10, 2018

Shouldn't that be a sparse file?

umanwizard · on Oct 10, 2018

Not sure whether anyone is writing iOS apps in go, but iOS refuses to allocate more than a relatively small amount of address space to each process (a few gigs, even on 64-bit devices).

dijit · on Oct 10, 2018

I used to think this. Then I deployed on windows. Virtual memory can’t exceed total physical memory (+pagefile) or else malloc() will fail. I am currently having an issue where memory is “allocated” but not used causing software crashes. Actual used memory is 60% of that.

TimJYoung · on Oct 10, 2018

The page file in Windows can grow and the max size, I believe, is 3 times the amount of physical memory in the machine. So, if you're trying to commit more than [Physical Memory x 4] bytes, then yes, it will fail. But, more than likely, you'll get malloc failures long before that due to address space fragmentation (unless you're doing one huge chunk).

Strom · on Oct 10, 2018

The automatic size management doesn't go over 3x RAM, but manual configuration allows for a maximum page file size of 16TB. https://blogs.technet.microsoft.com/markrussinovich/2008/11/...

networkimprov · on Oct 10, 2018

Re "unreasonable" I simply pasted the title of the Github issue :-)

exabrial · on Oct 10, 2018

You cannot possibly be serious:

package main

func main() { for { } }

Using 100mb+ of memory?

outside1234 · on Oct 10, 2018

It is not "using" but allocating the addresses. The committed memory will be vastly smaller. This is how virtual memory works.

masklinn · on Oct 10, 2018

> Using 100mb+ of memory?

It's not using them.

On modern OSX, processes get something like 2.4GB vmem by default, even if they do nothing.

    #include <unistd.h>

    int main() {
      for(;;) {
        sleep(10);
      }
    }

is reported as 2377M VMEM by top/htop, on 10.11.

weberc2 · on Oct 10, 2018

To quote the parent: `The nice thing about virtual memory is that it's, well, virtual. It costs you almost nothing until you've touched it.`

Thaxll · on Oct 10, 2018

Virtual memory is different than resident / RSS ...