I must be missing something. From the article:

> the allocation rate goes down at least 5x, and time to execute goes up 10x! And this is not even starting to touch what a collector has to do when multiple threads are asking for memory (probably contended atomics)

...

> For Epsilon, the allocation path in GC is a single compare-and-set - because it issues the memory blocks by pointer-bumps itself.

So both the TLABs and the Epsilon GC are doing pointer-bumps, right? Why does TLAB outperform no-TLAB even in the single-thread case? Just because the pointer-bump instructions are inlined?