> For a video game running at 60 frames per second each frame has 16 milliseconds in which to complete, so a pause of less than a millisecond is not going to be significant relative to other overheads.
This is definitely very significant, you just took away 6% of the CPU time available to the game to calculate everything necessary for that frame. And that's if there's only one pause inside each frame. And if you are running close to 100% CPU time, and the pause happens right before the vsync, you might miss the vsync and you will drop a whole frame.
This is not only a problem with GC, even with more deterministic memory management, like plain malloc() and free() in C, you can have pauses if the operating system needs to go and find some free memory, update page tables, flush TLB caches and so on. So for games and anything real-time, you probably want to avoid memory allocations as much as possible, and certainly don't want your whole process being paused at random. So if you are stuck with GC, then one question would be: can GC be done for individual threads without blocking other threads?
You're right, yes - a 1ms pause has a cost, even in a 16ms budget.
I interpreted the author's point as "GC is now potentially viable for games", in the sense that older GCs might have pause times larger than 16ms, making them obviously inappropriate, whereas today 1ms pauses is something that can be budgeted for at least in some cases.
As you said, malloc/free also have costs, and also GC has other benefits, like bump allocation, moving things to compact memory and improve cache locality, etc. 1ms pauses means GC is worth considering even for a game, in other words.
> As you said, malloc/free also have costs, and also GC has other benefits, like bump allocation, moving things to compact memory and improve cache locality, etc. 1ms pauses means GC is worth considering even for a game, in other words.
The general approach used by games is probably superior, which are arenas or zones.
One approach games use is to allocate everything that's necessary for a given frame in a single large block, carve it off over time as needed, then drop the entire arena. All allocations then have memory locality, no fragmentation, basically zero cost to allocate (increment a pointer) and zero cost to deallocate.
This kind of thing actually plays really nicely with Rust's lifetimes since you can couple frame lifetime with objects in that frame and get static validation. Arenas are already available in nightly [1].
"Games" is a term for are a large collection of very different types of software, do we mean AAA FPS games? Then I would probably not use a GC, if we mean most other games? Then it's just fine. Many GC events are way shorter than 1ms also...
Turn-based or pausable-real time 4X games where you spend most of your time paused and also get hit by a regular (~1000 ms) autosave freeze spring to mind.
If you look around configuration REDEngine configuration files in PC version of Witcher 3, there is file with what for me looks like tuning parameters of Boehm-like conservative tracing GC, it is probably the only file in there that even includes comments about what the parameters mean and partially how they were determined. My overall impression from the various configuration files is that the whole world streaming/demand-paging mechanism is somehow interwoven with or even partially driven by the GC (so the GC is not meant for some small-ish scripting heap)
Yeah, there's a reason games often preallocate large chunks of memory and then not allocate/deallocate during a level, use object pooling and stack allocators.
It seems like a strange example to give, when many modern games obsess over minimising delays in all sorts of ways (eg anything Mike Acton talks about: not just cache-friendly data-oriented structure-of-arrays or removing branches, but also eliminating function calls ("where there's one, there's many") and pointer indirection).
> It seems like a strange example to give, when many modern games obsess over minimising delays in all sorts of ways
And the more competitive ones tend to run higher than 60fps too, making the issue even more prominent, meanwhile for those for which it doesn't really matter (e.g. a 2D adventure game)… neither does ultra-low GC latency either.
The other question is: can I at least time this GC so it doesn't overlap the refresh interval and skip a frame? Because 1ms may not be significant if it lands in the middle of the 16ms period, but if it lands at the end you've just produced jitter.
So I'd think a GC designed for a gaming system would take this into account; incremental concurrent GC that can be timed to happen at the start of each each frame interval and be paused if it gets close to the end.
You can manually trigger GC in Lua. I’ve used this to trigger light GC work every frame in some games to buy more consistent GC times (no spikes) at the cost of higher average times.
True. This is why any serious game running on a VM with GC would avoid any kind of per-frame memory allocation and use arena allocators instead.
The whole premise of a GC to save you from thinking about memory allocation is moot if it comes to real-time sensitive tasks, there it makes things harder instead of easier. A reference counting system is much better IMO, even though you may be able to get higher absolute throughput with GC, at least the small overhead you incur is stable and predictable.
I don't think the idea of GC is to save you from thinking about memory allocation. Actually, not thinking about memory allocation is a great way to create very slow programs, independant of whether you us an GC or manually free memory, though modern GCs keep the performance penalty reasonably low. Still, you want to think about memory allocation, so that you don't allocate inefficiently.
What GC delivers is first of all the guarantee of correctness, a pointer is either nil or points to a validly allocated memory. Also, it removes most needs of bookkeeping. That simplifies programs and especially allows to write clean APIs, where functions are fine to allocate reasonable amounts of memory.
But wherever there are large parts of memory required, especially with a clear lifetime, I think a lot about memory allocation and even in a GC language I try to reuse memory wherever that makes sense.
So for a game engine, there should not be much need for allocation during the run time of a "level" and thus GC pauses should only happen between levels.
Not thinking about ownership over references to heap allocations is very very much the big productivity win from GC. Memory safety is usually an additional bonus, but it's a distinct thing; it's possible to have (conservative or refcounting) GC without memory safety and vice versa.
We work on a dual C++/C# codebase. Something like 3/4 C++ and 1/4 C#. Basically all of the memory lifetime errors happen in C# land. I do not recall a memory lifetime error _ever_ hitting master in C++, but we have one bug against _prod_ right now and two bugs against master right now, as we speak in C#.
Dealing with lifetimes in C++ is easy, dealing with it in C# is a nightmare. Maybe it's easier in Java or Go, I don't know, I've only dealt with Java in school and never coded in anger in Go.
async factory methods for stuff the API requires me to keep around, but the same API requires my objects to be constructed synchronously. (This includes stuff in .NET) C++ doesn't have async as a language construct. (yet. I dread the day.)
IDispose. We have an ecosystem where a nonnegligible number of in flight objects need to manually be disposed of. In C++, RAII takes care of this for us.
Un listening to events. Needs to happen manually in the standard .NET listeners. C++ solves this with weak_ptr. C# could solve this with a better standard library, but we have .NET.
Honorable mention (not lifetime, but the deadlock quagmire that is C# and its half async half synchronous standard library) is WritableBitmap, which is impossible to use correctly, and has not been deprecated or had a safe replacement offered.
C++ surprises me in ways I expect to be surprised. C# surprises me in ways that leaves me confused and perplexed.
> Not thinking about ownership over references to heap allocations is very very much the big productivity win from GC.
I very much doubt that this is a big _productivity_ win. Languages in which it is idiomatic to "think about ownership over heap allocations" (C++, Rust) aren't obviously less productive than comparable languages where such thinking is not so idiomatic (C, Java, .NET, ObjC, Swift etc.).
It's somewhat common to use refcounting (shared_ptr<>, etc.) in the more exploratory style of programming where such "thinking" is entirely incidental, but refactoring the code to introduce proper tracking of ownership is quite straightforward, and not a serious drain on productivity.
GC might not be a productivity win for you, but for many people it definitely is.
I'm pretty sure that's true for the great majority of software developers, but of course they don't even use a non-GC language!
Part of the reason they don't is that productivity. Not that they chose it personally for that reason, but e.g. historically enterprise code moved to Java and C# for related reasons.
(I also agree there are people that are equally productive in non-GC languages, or even more - different people have different programming styles.)
Enterprise code moved to Java (and later C#) for memory safety, period. The level of constant bugginess in the C++ codebases just made them way too messy and outright unmanageable.
The enterprise world moved to Java and C# because:
- It was a corporate language with corporate support and that matter a lot in many environment.
- It had at the time one of the best ecosystem of tools available.
- It was the mainstream fashion of a time and nobody get fired to buy Sun/IBM/Microsoft right ?
Most companies (and managers) could not less give a dare about your program crashing with a segfault (unsafe) or a null pointer exception (safe). It's the same result for them.
Not in a security-related situation, it's not! And to a lesser extent, lack of memory safety also poses a danger of silent memory corruption. (Yes, usually the program will crash outright, but not always.) And it can be a lot harder to debug a crash when it doesn't happen until thousands of cycles after the erroneous access.
Sun and Microsoft wouldn't have built and pushed Java and C# in the first place if there hadn't been a real need for safer languages.
> Sun and Microsoft wouldn't have built and pushed Java and C# in the first place if there hadn't been a real need for safer languages.
Excepted they were safer languages before Java and C#: Ada, Lisp, All the ML family.... And all of them never lift off.
Java and C# have been successful because they were accessible and easy to learn ( partially due to their memory model), not because they were safe.
As a parenthesis, a beautiful side effect of that has also been an entire generation of programmer that has no clue of the memory model their language use underneath, because "it's managed", because it's GC.....without even realising that their 50 Millions nested/mutual object graph will make the GC on its knees on production. With the results we all know today.
Maybe, but remember that computers were very, very slow and with small memory, so GC's overhead used to be unacceptable (Emacs == eight megabytes and constantly swapping? I've seen it)..
I think that Java came 'at the right time': when computers became fast enough that the GC overhead didn't matter (except where low latency matter).
Reference counting is not better, it's a poor man's GC. Reference counting means: 1) instruction pollution from all the refcount updates and their synchronization 2) circular references, ie memory leaks 3) too much time spent freeing memory (whereas a generational GC spends time on live objects only) 4) memory fragmentation and hence slow allocation (whereas in a normal GC allocation is just O(1)
Languages like Swift and Objective-C hide the reference counting for you, so 'instruction pollution' is not really something I care about. The overhead at the instruction & memory level is fairly minimal for these language too, by means of using tagged pointers. I'm pretty sure the compiler is smart enough to factor out reference counting for complete sections where it can determine objects can impossibly go out of scope (e.g. sections where aliasing can be ruled out, and all assignments are to locals, such as in many loops).
Circular references can be a problem, this is just something you have to live with and design for, just as in languages with manual memory management. In the typical cases where this can be a problem (graphs of objects) its very straightforward how to fix them using weak references.
I don't understand point 3 and 4 and why they would be a property of reference counting for memory management. They both seem completely orthogonal problems that have nothing to do with the mechanisms that decide when to free memory.
Anyway, my original point was not that reference counting is perfect, or even more efficient compared to garbage collection. Just that it is predictable and deterministic, which is very often much more important, especially for code with real-time constraints.
> Languages like Swift and Objective-C hide the reference counting for you, so 'instruction pollution' is not really something I care about. The overhead at the instruction & memory level is fairly minimal for these language too, by means of using tagged pointers. I'm pretty sure the compiler is smart enough to factor out reference counting for complete sections where it can determine objects can impossibly go out of scope (e.g. sections where aliasing can be ruled out, and all assignments are to locals, such as in many loops).
New Swift Ownership API can help, but compiler is not that good to figure out exclusive ownership all by themselves without any annotation. It is common to have more than 10% of you time in RC environment spent on refcount calculations and locks acquisition (some stats: http://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf).
> instruction pollution from all the refcount updates and their synchronization
If the language supports ownership tracking and non-synchronized Rc<> as in Rust, refcount updates ought to be rare and/or quick. I agree that this is very much an issue in languages with obligate "managed" memory such as Swift, and that tracing GC may sometimes be preferable to that.
> too much time spent freeing memory
If you're using Rc, you probably care about memory reclaim being deterministic, which tracing GC doesn't give you. You can also use arenas to free a bunch of objects all at once; this also addresses memory fragmentation and slow allocation to some extent.
Per-frame allocation would be very cheap with a tuned generational GC run every frame, as nothing would be live at the end of the frame and thus tracing would take almost no time. Of course the same would be true of an arena allocator, at the cost of needing to classify every allocation accurately.
> True. This is why any serious game running on a VM with GC would avoid any kind of per-frame memory allocation and use arena allocators instead.
The problem is having the VM allow and support that usage, and the JVM definitely is not best-of-class there. See ixy[0] from a few months back where the researchers / developers didn't manage to get under
> ~20 bytes of allocation per forwarded packet in Java.
They are not harder to predict or control than garbage collection, and have the benefit of being completely deterministic. Of course you can always come up with some pathological case that would trigger a 'deallocation avalanche', but you would almost have to deliberately make an effort to run into a thing like that. And if you do, it's typically very easy to fix these kinds of things.
In the Objective-C era before automatic reference counting, you had these things called 'autorelease pools' which allow pretty straightforward control over deallocation. I think the same thing is still possible in ARC Obj-C and Swift.
Games have a very regular object lifetime, allowing them to manage much of their allocations in arenas, which can certainly beat even modern GCs in performance (although maybe not for good). But concurrent server business applications don't have such regular object lifetimes, and it is getting quite hard to beat the performance of modern GCs with manual memory management in such applications. These days, footprint is the major and pretty much only significant cost to GCs in a great many applications.
My second job was displaying telemetry waveforms in real time, in Java... 1.3 if memory serves. MVP goal was 4 waveforms, I managed to get 16.
What we did to accomplish this was first aggressively optimize the data ingestion, then spend all that surplus on maintaining a ring buffer of pixels. This meant we were always a little behind live (.5s or 1s, I can't recall), but it made everything smooth as silk. To maintain homeostasis, I'd trigger GC every time we made it through a paint cycle with a full buffer.
There are ways to do it, but it could change the structure of your app. I say, "in our case the main paint loop was stupid-simple so it wasn't that big of a deal to tack it on," but the idea was always sort of in the back of my mind so it definitely informed the design.
What I didn't have to deal with is random inputs from the user changing my display strategy.
Many years ago, when I used to play Minecraft, the common wisdom was simply to get as much memory as possible, so that the GC runs less frequently. With the debug overlay you could see how the game kept allocating at about a rate of 300-500 MB/s until the JVM GC was triggered, which created a noticeable lag in the game. I used to know a few MC server admins, and I was surprised just how massive the resource consumption of the game was. One server I usually played on ran on multiple machines. Specs were I believe around 128 GB memory, 32 cores, 16 or so dedicated to GC...
Something I remember from that forum post was that instead of using array based representation for the meshes, it uses classes for every kind of information (Vector, Points,...). Very nice school OOP, but not always the best way.
Considering how variable game frames are, even if the GC eats that 1ms it will be unnoticeable, assuming it doesn't happen too often.
I mean, i play games like Skyrim that now and then give you a pause as you walk the world while it unloads cells (i refer to normal walking around, not entering/exiting somewhere - which, btw, would be a good place to force a GC too in a similar game that would be written in a GC'd language) which can even be 1 second long. 1ms is noise long lost in there.
> Considering how variable game frames are, even if the GC eats that 1ms it will be unnoticeable, assuming it doesn't happen too often.
For a competitive game you are looking at a refresh of 240 Hz (4 ms) but people generally target frame rates of at least 300-400 Hz (2.5 ms per frame!). There is simply no room to randomly throw in 1 ms pauses here and there without introducing micro-stuttering.
Even for a casual game 1 ms of extra frame time variance is bad. If your Skyrim has one second freezes while roaming the world, your install is busted, because the game in proper working order doesn't do that.
> For a competitive game you are looking at a refresh of 240 Hz
Yes, it depends on the game, what i wrote doesn't apply to every single game ever made out there, but it does apply to a ton of games.
> Even for a casual game 1 ms of extra frame time variance is bad.
If it happens every several minutes you will 100% not notice it at all. In pretty much every high end 3D game (especially an open world one) you'll get way more variance by turning the camera around and/or just walking than that.
> If your Skyrim has one second freezes while roaming the world, your install is busted, because the game in proper working order doesn't do that.
I'm 100% sure my Skyrim install is perfectly fine (it is a fresh one) and it does happen, just not frequently. If you haven't noticed it... well, as i already wrote, it isn't really noticeable. And - especially on Skyrim - you'll also get way more variance than 1ms just walking around.
Another example that came to my mind after I wrote the comment above is that in comp. games frame timing can reveal critical information, i.e. act as ESP. In CS:GO for example the game does not know the position of enemy players at all times, the server only sends their positions if you could possibly perceive them. However, this does mean that if you are holding certain positions and the other team is sneaking up on you, you will experience a reduction in frame rate once your game starts to process them.
The entire game loop would need to be constant time w.r.t. player activity to avoid this issue!
> I'm 100% sure my Skyrim install is perfectly fine (it is a fresh one) and it does happen, just not frequently. If you haven't noticed it... well, as i already wrote, it isn't really noticeable.
My ability to detect lags of one second (like you wrote) is approximately 100 % assuming I am actually looking at the screen. I've seen pauses like this when I had a hard drive and with the 32-bit version of the game when it experienced memory pressure, and also due to badly written mods.
> And - especially on Skyrim - you'll also get way more variance than 1ms just walking around.
That's sadly true, though it tends to run smoothly when it has finished loading everything for an area.
Yes, the second-long pause was on the 32bit version, i do not remember noticing this in the 64bit SE version but i only bought that last week and my PC is in transit as i moved places so i haven't had enough time to see how that works. I have an SSD and relatively powerful computer though. In any case, my point is that it doesn't really matter as it doesn't affect anything and you'll notice pauses and performance hiccups way worse than 1ms not only in Skyrim, but in pretty much every high demanding game out there. Especially in open world games (hence the Skyrim reference) where the engines loads and unloads stuff all the time and there isn't a single point where you can have resources "settled". So that 1ms, if it happens rarely enough it wont be noticed by pretty much anyone.
The issue with GCs isn't really that 1ms, the issue is that some GCs (which sadly includes most known ones - like those in Java and C# - but not all GCs) do not provide any form of control beyond mere suggestions. For example last time i checked running System.gc() in Java doesn't really do a full GC run, so you cannot guarantee that by the time gc() returns you'll have all unused memory gone and placing that call between area/level loads in games (where the user expects some sort of delay anyway) doesn't help you much. Similarly you can't just "turn off" the GC until a later point (or until some specified failsafe threshold is reached) so you can't turn off the GC during normal levels and turn it on between level loads and/or when the player opens any inventory/menu/whatever screen (where again a tiny delay will be mostly unnoticed and/or ignored and often can be masked by the UI design) to clean up any garbage.
IMO a GC that gives you such control and takes 100ms is way more useful for games than a GC that give you no control and takes 1ms.
> For a video game running at 60 frames per second each frame has 16 milliseconds in which to complete, so a pause of less than a millisecond is not going to be significant relative to other overheads.
First of all: bullshit. Second of all: bullshit.
99th percentile of 1ms means that 1% of the time you're worse than that. The author says sometimes it's as bad as 8ms. A dropped frame every 15 seconds or so will make your game unplayable. 99th percentile measurements are useless. Tell me how often your users are subjected to any given latency.
One millisecond is enormous. My kingdom for a millisecond. If one in ten frames losses 1ms to a GC and one in a thousand frames looses 4ms, that means my budget is 12ms down from 16ms. Thanks, you've sent my expected environment back to 2010.
People think they can just say "99th percentile is bad but not terrible" and assume that's the end of the argument but that's not how the world works. 99th percentile frame time happens once every 1.7 seconds. If your web page loads 200 assets (which is basically all of them these days) nearly all page loads will hit worse than 99th percentile for one of those asset loads.
As we say in fintech, a millisecond is an eternity. (Lately, I have been hearing that a microsecond is an eternity.)
Garbage collection promotion is always, fundamentally, an exercise in doublethink. How can an intolerable process be made to seem tolerable, so that my dodgy language which depends on it can be used in place of a mature language which has a robust mechanism for managing all resources, not just memory?
It can't. GC is fine for things that don't matter, but things have a way of coming, in time, to matter. Then you have a Problem.
They do mention they may have missed factors in the original article it links to:
> As you can see, there are a lot of different factors that go into designing a garbage collector and some of them impact the design of the wider ecosystem around your platform. I’m not even sure I got them all.
Power consumption has some subtle details which make it different from both CPU and memory overhead. For instance, to reduce power consumption it's important to reduce wakeups.
Incremental GC is very promising for games in favour of traditional stop-the-world GC. It's essentially time sliced GC that runs over multiple frames. Unity supports it and if you tweak if correctly for your needs the GC can run while the CPU is waiting for the GPU.
I don't think there are any GCs that do not stop-the-world at some point, except those with special hardware support, are there? Even C4 requires in theory stopping individual threads, and in practice even has a global stop-the-world phase.
It's easy to write a GC that doesn't stop the world at all. A Baker treadmill [1], for example, is a very simple type of incremental collector that can be implemented quickly. (It's a fun exercise, by the way!) If you implement it "by the book", it will have no stop-the-world pauses whatsoever (assuming the mutator doesn't allocate faster than collection can happen).
The problem is that a simple GC like this will have low throughput, so it's not worth it. These are some of the most common misconceptions about GCs: that it's hard to write a low-latency GC and that pauses are all that matter. In reality, GC engineering is all about the tradeoff between throughput and latency.
In AAA titles especially gains in hardware are generally not squandered to make the software/engines simpler, but rather invested towards better fidelity or higher frame rates / better frame pacing (which is very, very noticeable). Games are one of few areas where the chronic wasting that is pervasive in almost all other software development is not accepted, by and large.
It's odd how he goes out of his way to make the Go team's decisions seem strange just because they're different. What's going on there?
I'm happy that the Go team doesn't want to expose tuning knobs. I've seen a lot of people fiddle with JVM settings without doing the controlled experiments needed to see if it actually helps on a particular machine and I've done that myself. It ends up as cargo-cult programming, like people sharing magic JVM settings on the wiki to allegedly make IntelliJ faster. (It worked for one person!)
> I've seen a lot of people fiddle with JVM settings without doing the controlled experiments
That's a problem with what the people are doing then, not the JVM. Furthermore, the new low latency JVM GCs only have 2-3 knobs to tune.
golang likes to pretend that complexity doesn't exist, and goes for the most simplistic approach, at the cost of things like throughput, code size, speed, code maintainability, etc. The JVM is suited for a much wider range of tasks.
That's not the impression I get from Go literature at all.
Go authors tend to be quite humble about Go being targeted mostly to a specific class of software, read servers. And Go is pretty darn successful at it.
The lack of GC knobs is an informed decision within that context.
"Servers" refers to a broad range of software. If you're writing a simple app that parses JSON and handles REST, it can be an ok fit. Now if you're writing server software that needs to be high-throughput, then that's where you hit golang's limitations.
This doesn't negate the fact that I stated. You can throw more hardware at the problem to reach higher throughput if your problem domain allows for it (such as the use case you link to). It goes without saying that this is an inefficient approach, not to mention that this won't apply if you're running batch jobs for instance where you need high throughput (e.g. on individual nodes).
It's not that the Go team's decisions are different; it's that they are worse for most applications. If you read the first part of the series, the author goes into detail on this. The problem is essentially that the Go GC focuses entirely on latency at the expense of throughput, which is not the best choice in most cases.
Well, yes, he says that, but I'm not sure it's an objectively worse tradeoff? It seems like it depends on what you think "most apps" are like. It's not obvious to me that throughput is usually more important than latency, since latency is what the user sees and throughput has more to do with how many machines you need.
> It's not obvious to me that throughput is usually more important than latency, since latency is what the user sees and throughput has more to do with how many machines you need.
No, it's not true that "latency is what the user sees". Only for interactive applications is that true. In fact, given a choice between the two, I would generally choose throughput over latency.
The Go compiler is an example of an application in which throughput matters 100% and latency matters 0%. Even with some semi-interactive tools like ripgrep, throughput matters much more than latency, because throughput determines how quickly the search finishes.
Even for servers, throughput often matters more in ways beyond how many machines you need. Would you rather have a Web page that takes 10 ms to load with 1 in 10,000 loads taking 2 seconds, or a page that takes 200 ms across-the-board? I'd take the former. Remember that another name for throughput in this context is allocation performance, which makes it clear how important it is.
They have had a personal bone to pick with the Go team for a while as a result of some of their marketing language around low-latency GC designs. It sounds to me like their personal work focuses on high-throughput designs, and an increasing industry emphasis on low latency threatens that.
In the area in Stockholm, Sweden where I live we throw "all" garbage (we of course separate glassware, metal, paper, electronics first) in the same bin, but food waste in green bags. I don't know if it's robots that does the final sorting but I would believe so. Less than 1% of household trash ends up in landfills here.
The technical detail in this article is excellent! A great read.
But I think it would have been an even better article without the negativity about Go and how the author thinks "the Java guys are winning" in his words. That felt a little petty.
Except that it isn't necessary nor is it completely honest. Go and Java take different approaches here, but the article focuses on the merits of the Java approach and the downsides of the approach taken by Go.
Example 1. The article talks about compaction and generational collection as being Good Things(TM), but it doesn't talk about the costs associated with them. Looking at the linked Go article, these approaches suffer from high write barrier overhead. For Go, this isn't worthwhile because escape analysis allocates many young objects on the stack (which btw is effectively bump-pointer allocation) so trying to further reduce GC overhead by increasing the overhead of every pointer write is just not worth it. It may, however, be the right trade-off for Java.
Example 2. Java's many tuning parameters means that programmers who care about performance have to choose the right GC and tune it. If better GCs come out or tweaks to the algorithms are made, these configurations have to be updated. In contrast, Go programs gets these benefits for free. The best approach seems to be to offer a small number of high-level knobs, but it's hard to determine what those are, leading to the two (suboptimal) extremes you see with Go and Java.
> Example 1. The article talks about compaction and generational collection as being Good Things(TM), but it doesn't talk about the costs associated with them. Looking at the linked Go article, these approaches suffer from high write barrier overhead.
You need a write barrier no matter what for any sort of incremental or concurrent GC, to maintain the tricolor invariant. Otherwise there is no way for the runtime system to detect a store from a black object to a white object. Typical GCs will fold the write barrier needed for generational GC into the write barrier needed for incremental/concurrent GC, so there is no need for extra overhead if properly implemented.
> For Go, this isn't worthwhile because escape analysis allocates many young objects on the stack (which btw is effectively bump-pointer allocation)
Java HotSpot has done the same thing for a long time! It's just that in HotSpot escape analysis doesn't really help allocation performance, because the generational GC already offers bump allocation in the nursery. Escape analysis in the JVM does open up more optimizations, though, because it serves as the scalar-replacement-of-aggregates transformation.
> so trying to further reduce GC overhead by increasing the overhead of every pointer write is just not worth it.
This is only because of their specific implementation. There is no need for increased overhead.
> If better GCs come out or tweaks to the algorithms are made, these configurations have to be updated. In contrast, Go programs gets these benefits for free.
There is no reason why Java can't do the same by updating defaults. In fact, they often do.
>> If better GCs come out or tweaks to the algorithms are made, these configurations have to be updated. In contrast, Go programs gets these benefits for free.
> There is no reason why Java can't do the same by updating defaults. In fact, they often do.
Correct. The JVM guys always update the default GC to be the nearest to 'one size fits all'. Obviously if you've made a custom GC configuration then you want a level of tuning that Go does not provide.
Yet another person who only skimmed the article and hasn't read Part 1, which explains GC tuning tradeoffs and addresses the two examples you mentioned. Part 1 debunks the dishonest marketing from Google around Go's GC when it was originally released, and discusses the tradeoffs that weren't disclosed.
But he didn't say that. That's your own ridiculous emotional projection. The article contains very thoughtful criticism backed by detailed analysis. I don't think you even bothered to read the full articles (there is a part 1). The final sentence was, "Overall, it looks to me like the Java guys are winning the low latency game.", and that's a fair conclusion.
Based on the facts stated in the article (part 2), it looks like (years later) Java now has a new garbage collector available (ZGC, still labeled experimental, not used by default) [1]. It's an order of magnitude lower latency than current default collector, but it still has an order of magnitude higher latency target (10ms vs 1ms) than Go. I'm sure there are many good reasons for the trade offs made and that's a huge improvement, but in light of the basic numbers I'm not sure that "the Java guys are winning the low latency game" is, in fact, a fair conclusion.
Go introduced a new GC a while back with dramatically lower latency numbers, not only compared to the previous Go GC but any other commonly available GC in any language (excluding exotic commercial ones like Azul). Neither the old or new Go GCs are particularly sophisticated compared to the highly developed ones in the Java ecosystem.
The author of these articles seems to take objection that the latency numbers were achieved not by magic or pure GC implementation genius but by simply optimizing for latency, which involves some trade offs. I don't think this negates the usefulness of having a low latency GC available though, although I can understand how it might be frustrating to see a project getting a lot of attention for what feels like a lesser intellectual achievement.
Obviously not all projects even require low worst-case latency, monolithic apps will be less sensitive, applications waiting on 10 other services will be more etc. Some apps the CPU isn't a bottleneck either. It's just another trade off where Go is prioritizing some things. There's even other factors not mentioned in the article like a compacting collector might make calling C functions more complex since it needs object pinning.
[1] There's also another GC mentioned that's less talked about in the article, Shenandoah, that requires patching the JVM and introduces memory overhead to every object for a forwarding pointer. It was hard to find numbers, but it looks like the latency target for this GC is also in the 10ms range (http://clojure-goes-fast.com/blog/shenandoah-in-production/).
The page you linked to says "How do you get Shenandoah? This garbage collector has officially become part of JDK only since version 12 and is available in AdoptOpenJDK 12 builds."
> and introduces memory overhead to every object for a forwarding pointer.
In the very sentence you quoted I said “excluding exotic commercial ones like Azul“. I’m not sure what point you are trying to make here.
> Shenandoah 2.0 does not
I see. I only googled that link to find the latency target (since it wasn’t mentioned in the original article), I have to confess I didn’t read the rest of it and almost all of what I wrote is based on the original article. Good to know that some of that information is now out of date.
This is definitely very significant, you just took away 6% of the CPU time available to the game to calculate everything necessary for that frame. And that's if there's only one pause inside each frame. And if you are running close to 100% CPU time, and the pause happens right before the vsync, you might miss the vsync and you will drop a whole frame.
This is not only a problem with GC, even with more deterministic memory management, like plain malloc() and free() in C, you can have pauses if the operating system needs to go and find some free memory, update page tables, flush TLB caches and so on. So for games and anything real-time, you probably want to avoid memory allocations as much as possible, and certainly don't want your whole process being paused at random. So if you are stuck with GC, then one question would be: can GC be done for individual threads without blocking other threads?