Bigger batches are good, but they result in locking. Picking a good batch size relative to how much data you have
is important. This new technique lets you, effectively, buy a "meta batch" for free (that is a terrible analogy, but it's the best I can do.).
As batches get bigger and can't fit inside a single gpu or single compute node, your challenge becomes data transport. So anything that will be able to decouple your computatational agents can be a win.
In this case, it's a more clever way of decoupling your agents. Normally asynchronous batches are awful, but this is kind of a very clever way of allowing for asynchronous batching of your data.
If I may opine on the matter, I think we're reaching a point where machine learning researchers should start thinking about abandoning python as a programming medium. For example, the other decoupling strategy (decoupled neural net backpropagation) doesn't really seem like something I would want to write in python, much less debug someone else's code. Python is really not an appropriate framework for tackling difficult problems in distribution and network coordination.
As long as the big ML libraries support these strategies, people will use them. The choice of user language is not critical. Tensorflow/PyTorch are basically an ML-specific programming model with a Python interface.
Did you read what I wrote? I'm not making any claims about numerical performance. I'm saying there are better choices (in terms of being easy for the programmer to write and debug) for programming other aspects, like network, asynchronous coordination, etc.