What is the TLS overhead? On x64 you presumably could spare two registers, one for the current heap pointer and one for the end of the heap. How could you implement this more efficiently with a single threaded version?
Not sure if LLVM has an equivalent of global register variables, though.
Not sure if LLVM has an equivalent of global register variables, though.