Interesting to read this. I was a little confused at first because the informati...

Interesting to read this. I was a little confused at first because the information bottleneck paper wasn't new, and then as I read I realized they acknowledged that. It's interesting to see followup and new research coming out about it, because it struck me as promising when I first read about it.

The information bottleneck idea is very similar, basically the same, as how I've always thought about DL models, and statistical models more generally. The hidden variables at each layer are basically digitized codes, and there's a compression at each layer, which is equivalent to learning/inference as in an algorithmic complexity/MDL sense.

What was surprising to me was the relationships with renormalization groups, which I wasn't familiar with at all.

The quote from Lake was also interesting. I forgot about that Bayesian Program Learning paper, which was interesting. My guess is BPL and DL are not really all that different at some level.