Except that the OS zeros memory when it's released...

Doradus · on March 29, 2015

The JVM doesn't release the memory to the OS when garbage is collected; only when the heap shrinks. Any zeroing the OS might do is proportional to the size change in the heap, not to the number of objects freed.

TheLoneWolfling · on March 30, 2015

Good point.

However, that brings up a related issue - namely that the JVM requires all (or close enough to all) object memory to be initialized. And given that the amount of memory required to be initialized is at minimum linear w.r.t. the number of objects required to be initialized...

Work done is still O(n) minimum w.r.t. the number of objects allocated regardless.

Doradus · on March 30, 2015

Well, it's hard to deny that the work done by the program is O(n) (or even Ω(n)) in the number of objects created by the program. That's almost tautological.

The thing that is interesting about the asymptotic complexity of garbage collection, though, is that it gives you a peek into the future of GC overhead as heaps get larger.

Consider a program with some bounded steady-state memory consumption, and a GC that takes O(1) time to free unlimited amounts of memory. In such a system, you can reduce your GC overhead arbitrarily simply by enlarging the heap. A larger heap gives a larger time interval between collections, yet doesn't make the collections any slower. Don't like 6% overhead? Fine. Make the heap 10x larger and you have 0.6% overhead.

This time-space trade-off is always available to the end user of such a system. It's certainly not available with more primitive systems like malloc+free.

[These are my personal opinions, not those of my employer.]

TheLoneWolfling · on March 30, 2015

It doesn't matter what the time to free memory is as long as it is O(n) w.r.t. the number of objects (or less), the limiting factor is allocation. As, in steady state (as you are assuming), all memory freed will be reused, and (effectively) all memory used will be written to at least once, and memory used by a group of objects is at least O(n) w.r.t. the number of objects.

So no, I don't see the advantage you say. I see that, in the best case, it's on par with non-GC'd systems, asymptotically speaking.

And yet, I see plenty of disadvantages. To start:

I have yet to see a GC that's O(1) w.r.t. the amount of memory to free. Best case? I'll grant you that, for an extremely naive copying collector. But the chance of that best case occurring is... not significant. Especially with the more sophisticated algorithms.

And again, as above, it doesn't matter how long it takes to free memory. As long as it's linear or sublinear, it'll be dominated, asymptotically speaking, by how long it takes to initialize that memory again. And in steady-state operation it will be reinitialized.

Not to mention the practical effects.

For instance: GC thrashes cache. Badly. It completely kills the concept of a working set. You can't work around that by making your heap larger - if anything it makes it worse. It makes the time period between thrashings longer, yes, but makes them worse when it does happen.

GC also doesn't scale well to large numbers of cores. In particular, your worst case "always" will be when the GC has to check with every core to see if it still has a reference to something. And given that processor numbers are where most of the performance improvements seem to be coming in... You mentioned memory size improvements but didn't mention increasing numbers of cores.

GC also (almost always) has field(s) per object. Minimum of O(1) overhead per object initialized.

Which means that, generally speaking, the overhead doesn't asymptote to zero. It asymptotes to a constant - and non-negligible, at least from what I've seen - amount of overhead.

Doradus · on March 30, 2015

I seem to have hit a nerve here. Perhaps I haven't been clear about some of the points I'm making, which I don't think are all that contentious.

I'm not saying a fast GC can reduce the cost of allocation. I'm saying a fast GC can reduce GC overhead. I think that's an uncontroversial statement. I can only insist so many times that Java's GC doesn't touch dead objects at any time during its scan. If you want to disagree with me, that is your prerogative.

I agree that a GC walk can thrash cache when it happens. However, a copying GC can also rearrange object layout to improve locality. Which effect is more significant depends on the workload.

I didn't mention the number of cores because it didn't occur to me that it was significant to you. GCs can scale just fine to many cores. No core has to "check" other cores: each core can check its own local data, depending on the tactics employed to deal with NUMA. There are always challenges in scaling any parallel workload, of course, and there are always solutions.

Our GC (in IBM's J9 JVM) uses 8 bits in each object. With classes aligned to 256 bytes, there are eight free bits in each object's class pointer. Hence, J9's object model has one word of overhead on top of the user's data, which is what malloc/free typically needs to track each object's size anyway. No disadvantage there.

[These are my personal opinions, not those of my employer.]

xamuel · on March 30, 2015

>and a GC that takes O(1) time to free unlimited amounts of memory

If the unlimited amounts of memory are all in one contiguous block and ALL of them are ready to be free'd (and determining that all of them are ready, takes O(1) time somehow) then what you say is plausible.

In any other case it's fantasy. Suppose I have n variables and I point each of them at a distinct new object (not pointed to from anywhere else). Then I re-set a random subset of them to null. Based on which ones I randomly set to null, there's 2^n possibilities for what the GC must do. If your claimed GC O(1) is long enough for the processor to take k conditional jumps, there are at most 2^k possible results of running it. For n>k, there are necessarily subsets in the above scenario which your GC fails to handle.

>This time-space trade-off is always available to the end user of such a system

People routinely run servers on virtual machines with very limited RAM.

Doradus · on March 30, 2015

Your argument proves that the time taken can be no less than O(n) in the number of object references in live objects. The (potentially billions of) unreachable objects are never touched.

I agree that limited RAM environments are important. Asymptotic complexity only matters when you're effectively operating at the asymptote.