Exponential Backoff and Jitter

arghbleargh · on March 15, 2015

> It’s worth noting that none of these approaches fundamentally change the N^2 nature of the work to be done, but do substantially reduce work at reasonable levels of contention.

I don't think that statement from the article is true if I understood correctly... the performance gains are precisely because you're reducing from N^2 to N log N.

An interesting theoretical question would be to identify the optimal backoff policy. I think FullJitter should be close to optimal, but maybe you can squeeze out a little more with something more sophisticated.

Edit: I just realized the DecorrelatedJitter (not sure why it's called that?) makes a lot of sense as a minor optimization because if you already waited a long time and still failed, that suggests there is a lot of contention, and you should wait even longer.

tptacek · on March 15, 2015

Great post. There's a classic LBL paper about self-synchronization in (I think) time protocols that demonstrates the same conclusion in a more dramatic way.

mkempe · on March 15, 2015

Maybe this one? "The Synchronization of Periodic Routing Messages" (1994) [1]

[1] http://ee.lbl.gov/papers/sync_94.pdf

tptacek · on March 15, 2015

Yep! Sally Floyd f.t.w!

Thanks for finding that.

diggs · on March 15, 2015

What a coincidence, I found myself writing a little Go lib for exponential back off yesterday, and was wondering what the purpose of introducing jitter was and what effects it would have. I need it for interacting with the Twitter stream api and they're pretty clear that when they say exponential they mean it. But jitter looks very useful in some situations after reading this, so I added that to my lib as well...

It's here if anyone's interested it https://github.com/Diggs/go-backoff

ekimekim · on March 16, 2015

If we're talking about implementations in various languages, I might as well link to my capped backoff with jitter implementation for python.

https://github.com/ekimekim/pylibs/blob/master/libs/backoff....

(apologies for the long link, I keep all my one-file python libs in a single git repo for sanity. I think i've uploaded this one to pypi, where it's available under the name "backoff")

RickHull · on March 16, 2015

Hi Mike!

Thought you might be interested in my Elixir impl, though not as featureful: https://github.com/rickhull/backoff/blob/master/lib/backoff....

Example usage: https://github.com/rickhull/backoff/blob/master/examples/dem...

powertower · on March 15, 2015

Wouldn't this be completely alleviated by placing each new connection into a queue, instead of rejecting it and telling the client to try again? Or am I reading this wrong?

djb_hackernews · on March 15, 2015

The simplest way to understand OCC is to basically take any of your models and add a version attribute and whenever you update an object you also do an UPDATE ... WHERE version={obj.version}.

This is useful when you don't want writes to overwrite previous writes. If the version check doesn't hold it fails letting your client know that your assumptions about the state of the data are wrong. You then have a few options, like either redoing your computation using the new state or not doing anything at all because the current state is good.

TheLoneWolfling · on March 15, 2015

The version with jitter but without exponential backoff is still doing O(n^2) work, I'm pretty sure, although with a (much) smaller constant.

Just something to keep in mind.

Also, I find the inherent time taken / work done tradeoff interesting. Something involving a limited amount of server state might work better on the work front while keeping the work done limited. (Something like "please don't try again for x ms" sent as a reply)

The ideal case is, what, 2n-1 calls (everyone sends at once, server replies to the first guy and schedules all other clients such that there is no contention) and O(n) (what is the scale factor here? 20ms?) delay? Are there any algorithms that come close to the limit?

oautholaf · on March 15, 2015

I wish the experimenter had explored random distributions other than uniform. I suspect uniform distribution may be non-ideal here and other distributions may produce better results.

encoderer · on March 15, 2015

I've always thought of "jitter" as just adding a dash of entropy to the system I'm building.

I suppose a textbook example is... If I'm managing a pool of long-running threads I want to periodically bounce, I could write logic to throttle respawning threads. Or I could just add a rand() to the conditional -- introduce "jitter" -- and let probability theory "throttle" for me.

I'd rather build on top of probability than statistics.

msellout · on March 15, 2015

Randomness is an easy way to ensure you're not doing something systematically wrong.

kabdib · on March 15, 2015

Nice post.

Use of pure color in a graph makes me sad, though. This colorblind guy had to crack open a paint program and match up RGB values. A few dotted/dashed lines go a long way (and I suspect, for more than just colorblind folks, too).

TheLoneWolfling · on March 16, 2015

Next time, just do a hue shift. Alternatively, there are filters that lighten red / darken green (or vice versa).

(Assuming red/green. There are analogous methods for other types of color blindness, however.)