My hunch regarding overcommit is that Linux should sort out this situation, making disabled-overcommit a first-class scenario.
We (application developers) will follow and adjust our programs to correctly handle malloc() failure -- after all it's quite easy to fix that even in existing applications.
One thing that's needed is efficient ways to ask for, and release, memory from the OS. I feel like Linux isn't doing so well on that front.
For example, Haskell (GHC) switched from calling brk() to allocating 1 TB virtual memory upfront using mmap(), so that allocations can be done without system calls, and deallocations can be done cheaply (using madvise(MADV_FREE)). Of course this approach makes it impossible to observe memory pressure.
Many GNOME applications do similar large memory mappings, apparently for security reasons (Gigacage).
It seems to me that these programs have to be given some good, fast APIs so that we can go into a world without overcommit.
>We (application developers) will follow and adjust our programs to correctly handle malloc() failure -- after all it's quite easy to fix that even in existing applications.
Can you elaborate on why this is easy? It seems really difficult to me. Wouldn't you need to add several checks to even trivial code like `string greeting = "Hello "; greeting += name;`, because you need to allocate space for a string, allocate stack space to call a constructor, allocate stack space to call an append function, allocate space for the new string?
Even Erlang with its memory safety and its crash&restart philosophy kills the entire VM when running out of memory.
>Haskell (GHC) switched from calling brk() to allocating 1 TB virtual memory upfront
The choice of 1TB was a clever one. Noobs frequently confuse VM for RAM, so this improbably large value has probably prevented a lot of outraged posts about Haskell's memory usage.
For many low-level languages, it's simply a matter of finding all malloc()s, checking the return value, and failing as appropriate. That can mean "not accepting this TCP request", "not loading this file", "not opening the new tab".
Or in the worst case, terminating the program (as opposed to letting Linux thrashing/freezing the computer for 20 minutes); but most programs have some "unit of work" that can be aborted instead of shutting down completely.
Adding those checks is some effort of plumbing, sure, but not terribly difficult work.
> Wouldn't you need to add several checks
In the case of C++, I'd say it's even easier, because malloc failure throws std::bad_alloc, and you can handle it "further up" conveniently without having to manually propagate malloc failure up like in C.
> Even Erlang with its memory safety
Memory safety is quite different from malloc-safety (out-of-memory-safety) though. In fact I'd claim that the more memory-safe a language is (Haskell, Java, or Erlang as you say), the higher the chance that it doesn't offer methods to recover from allocation failure.
Theoretically, sure. If you've got your process trees setup properly, anytime a process ran into a failed allocation, you could just kill that process and free its memory. And if an ets table insertion fails allocation, kill the requestor and the owner of the table.
The problem is, I know for sure my supervison trees aren't proper; and i have doubts about the innermost workings of OTP --- did the things they reasonably expected to never fail get supervised properly? Will my code that expects things like pg2 to just be there work properly if it's restarting? How sad is mnesia going to be if it loses a table?
I'm much happier with too much memory used, shut it all down and start with a clean slate.
We (application developers) will follow and adjust our programs to correctly handle malloc() failure -- after all it's quite easy to fix that even in existing applications.
One thing that's needed is efficient ways to ask for, and release, memory from the OS. I feel like Linux isn't doing so well on that front.
For example, Haskell (GHC) switched from calling brk() to allocating 1 TB virtual memory upfront using mmap(), so that allocations can be done without system calls, and deallocations can be done cheaply (using madvise(MADV_FREE)). Of course this approach makes it impossible to observe memory pressure.
Many GNOME applications do similar large memory mappings, apparently for security reasons (Gigacage).
It seems to me that these programs have to be given some good, fast APIs so that we can go into a world without overcommit.